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Safety doesn’t happen by accident 


To create a strong biosafety culture, information on mishaps involving deadly pathogens must be 


reported and shared fully and transparently. 


hile the US Centers for Disease Control and Prevention 
Wo was investigating an accident involving anthrax that 

happened at a lab on its Atlanta, Georgia, campus in June, 
the agency’s director Thomas Frieden got a nasty surprise. Another acci- 
dent, this time involving the deadly H5N1 avian influenza virus, had 
been discovered at a CDC laboratory six weeks previously but had not 
been reported at the time. Frieden was angry, and rightly so. But this was 
not just a one-off — biosafety experts contend that many such incidents 
in secure labs worldwide go unreported. 

The CDC accidents raised many justified concerns, but they also led 
to some undue worries in the media and to political grandstanding. 
The risks posed by pathogens kept in high-biocontainment labs need 
to be kept in perspective. Many such agents are poorly transmissible, 
so pose mostly local threats — as well as the risk that they will be stolen 
and used in bioterrorism. Few are highly transmissible and able to 
spark epidemics of global significance. 

But some pathogens do pose such risks. In July 2003, a sustained 
public-health effort probably stopped the SARS (severe acute respira- 
tory syndrome) virus from causing a pandemic. But a few months later, 
lab accidents infected researchers in Taiwan and Singapore. And the 
following year, the virus was accidentally released from a lab in China 
and infected a researcher, then spread to her mother — who died — and 
anurse. A pandemic could well have resulted. 

If staff and public health are to be protected, then accidents must 
be reported in full, and the long-standing lack of progress here must 
end. Asa News article on page 515 reports, many accidents are caused 
not by a lack of physical barriers or regulations, but by the absence ofa 
strong biosafety culture in labs and their oversight bodies. 


A key part of such a culture is timely knowledge ofall accidents and 
their causes. That way, organizations everywhere can quickly take on 
board the lessons learned. The International Federation of Biosafety 
Associations, among others, has proposed the creation of an inter- 
national system for sharing such information confidentially, but the 
meagre funding needed has not been forthcoming. 

A confidential system would be a start, and deserves support, but it 
is not enough. Regulatory and oversight bodies throughout the world 
should require the reporting of all serious accidents and near misses in 
biocontainment labs, and in particular those that occur in labs with the 
highest biosafety levels. Timely incident reports should also be made 
available on public websites — as many nuclear regulators require of 
power plants — perhaps with an option for sharing details and more- 
sensitive information confidentially. 

Researchers must be given incentives to report accidents. A strong 
biosafety culture would clearly communicate and enforce the rules 
of play. Negligence should be disciplined, but researchers who have 
accidents while acting in good faith should not be penalized unfairly. 
Some of the current media and political reaction to the CDC accidents 
and the calls for disciplinary action against the researchers involved 
is unhelpful and potentially unjustified. On 22 July, Michael Farrell 
resigned as head of the CDC's Bioterror Rapid Response and Advanced 
Technology Laboratory in Atlanta, and other heads may roll, too. 

As one biosafety expert told Nature, the current criticism of the people 
involved means that most researchers would probably now think twice 
about reporting an accident. This blame game is unhelpful. What is 
more important, and in everyone’ interests, is to prevent future acci- 
dents. And that requires full data on accidents and why they happen. = 


Fishy business 


Delays in approving genetically engineered 
salmon may be a taste of worse to come. 


Regulators had released a draft assessment of the company’s 
genetically engineered salmon, which grow faster than normal, 

and found them to be environmentally benign. A few months after the 
assessment's comment period closed, the company began to raise more 
than 6,000 kilograms of salmon at its facility in Panama, in anticipation 
of the final approval by the US Food and Drug Administration (FDA) 
that would open the gates and allow the fish onto supermarket shelves. 
That optimism now lies buried alongside the fish, which were culled 
when the approval failed to come through. The FDA says that it is still 


r The mood at AquaBounty Technologies a year ago was buoyant. 


processing the more than 35,000 public comments made in response 
to the draft assessment. But for AquaBounty, based in Maynard, Mas- 
sachusetts, this is just the latest in a series of delays spanning nearly 
20 years (see Nature 497, 17-18; 2013). Many of the FDAs delibera- 
tions have taken place behind closed doors, fuelling confusion as to the 
cause of the setbacks, and rumours of political interference. 

As the delays have dragged on, the technology used to make Aqua- 
Bounty’s salmon has become outdated. In the current excitement over 
targeted gene editing that allows researchers to modify individual 
genes without leaving traces of foreign DNA, AquaBounty’s salmon 
— which contain a gene from another species — seem like a relic. 

But the company’s experience may hold a cautionary message. The 
FDA has not yet announced how it will evaluate animals engineered 
with gene-editing techniques. Its discussions are again occurring in 
private, leaving frustrated researchers to wonder whether the fruits of 
these technologies will meet the same fate as the beleaguered salmon. 
The FDA should learn from past experiences, bring these discussions 
before the public, and leave political considerations at the door. m 
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week in this journal, psychiatric researchers have uncovered a 

spread of genetic clues to schizophrenia, potentially shedding 
some biochemical light on how this dreadful disease develops. At the 
same time, a leading US centre for research on mental-health disorders 
announced a record US$650-million donation from philanthropist 
Ted Stanley to boost that work (see Nature 511, 393; 2014). 

Good news all round. And more could yet follow: genetic under- 
standing of psychiatric disorders, together with more research on the 
unusual ebb and flow of circuits in the brain, promise a revolution. 
Researchers of brain disorders compare the current state of their 
science to knowledge of cancer a decade or so ago, before molecular 
approaches could stratify patients and select specific treatments. 

The latest study on schizophrenia could be 
a small step forward in this march. Or it could 
be another false start in a field that has endured 
more than its fair share. Psychiatric research has 
yet to provide a single reliable biomarker to aid 
diagnosis and treatment. Self-reported symp- 
toms and their subjective interpretations remain 
the basis for clinical diagnosis. Drug companies 
have walked away. The task of unravelling the 
biological pathways that drive mental illness, 
which are needed before drug targets can be 
identified, has been declared too difficult and 
too expensive. 

Of course, some perspective is needed. Psy- 
chiatric research had a long and painful birth. 
Just a generation or two ago, at a time when 
physicists had split the atom and biologists were 
deciphering the structure of DNA, a common treatment for schizo- 
phrenia and other mental disorders was a metal spike hammered up 
through the top of the eye socket and waggled around. With such a 
history, a lag of a mere decade or so behind cancer research can be 
taken as a sign of rapid progress. 

Whether or not the latest study on the genetics of schizophrenia 
takes that progress forward, it has already contributed to the public 
debate around mental illness and public understanding of the issues. 
It has raised and highlighted the ‘C-word’: cause. 

I have obsessive-compulsive disorder (OCD). That used to be a 
secret, but in April I published a book about the condition and my 
experiences of it. Despite its frequent portrayal as a behavioural quirk, 
OCD isa vicious and debilitating mental illness, with some similarities 
to the experiences of schizophrenia. Simply put, people with OCD can 
have some of the same dark ideas, thoughts and 
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Cause is not everything 
in mental illness 


Welcome steps have been made in uncovering a biological basis for schizophrenia, 
but for many, the question of ‘why’ is unimportant, says David Adam. 


of such insight, and people with the condition typically attribute the 
intrusions to an external source.) 

Inow give talks about my OCD. A frequent question from the audi- 
ence is one that I am still ill-prepared to answer: “What caused it?” 

I don't know, and more to the point I don't care. For 20 years or so I 
have battled the symptoms. More recently, I sought and received treat- 
ment for those symptoms — a high daily dose of the antidepressant 
sertraline hydrochloride and several months’ worth of weekly sessions 
of cognitive behavioural therapy. It seemed to work, and without any- 
one — psychiatrists, psychologists or me — trying to identify the cause. 

Perhaps the question from others is down to simple curiosity. I tell a 
human story and it is natural to want to know how such stories begin. 
Maybe there is a degree of self-interest because people do not want to 
follow the path that I did. It could be me who is 
unusual in not caring about a cause, but when I 
find out that people have cancer or heart disease 
or have had a stroke, the cause of their suffering 
is pretty far down my list of enquiries. In the past 
two or three years, I have met lots of other people 
with OCD and other mental disorders. Many of 
them, like me, do not know and do not seem to 
care about the who, the where, the why and the 
when of their illness. There is only how. 

The other questions are not sinister. Instead, 
I think that they reflect an enduring mystery of 
mental illness. We do not know enough about 
the mind and the brain to build the backstory. 
(And as I said earlier, existing treatments do 
not require it.) Into this unknown creep the 
myths, the misunderstandings and the agen- 
das. In psychoanalysis, for example, as devised by Sigmund Freud, 
cause is everything and, sure enough, psychoanalysts usually find a 
subconscious cause for a mental disorder that can be conveniently 
addressed by — oh, psychoanalysis. 

The latest schizophrenia study helps to plug that causation gap. 
Schizophrenia has problems in the way that it is portrayed in the wider 
media, but the condition does escape the worst of the trivialization that 
plagues other forms of mental illness such as depression and OCD. 
No ignorant and patronizing opinion pieces have been penned in light 
of these latest developments to claim (as happens with depression, 
for instance) that the scientists are wrong and that schizophrenia is 
actually all about societal context and drug-company conspiracy. It 
is clearly an awful illness, and it — and by extension, other mental 
disorders — clearly has biological roots. 

To expose those roots might lead to new treatments in future, or it 
might not. Either way, it helps. m 


David Adam is Editorial and Column editor for Nature in London. 
e-mail: d.adam@nature.com 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Data mix-up in 
sea-ice record 


The recent, mysterious 
expansion of Antarctic ice 
could be overestimated 
because ofa data-analysis error, 
according to US scientists. 

Ian Eisenman at the Scripps 
Institution of Oceanography 
in La Jolla, California, and his 
colleagues found the mistake 
when they compared two 
versions of satellite data on 
Southern Hemisphere sea ice 
that were calibrated differently. 
The incorrect calibration of one 
of the data sets might account 
for more than half of the jump 
in Antarctic sea-ice growth. 

The finding means 
that either the 2007 or 
the 2013 report by the 
Intergovernmental Panel on 
Climate Change reflects this 
error, but the authors were not 
able to determine which one. 
Cryosphere 8, 1289-1296 (2014) 
For a longer story on this research, 
see go.nature.com/owasxc 


MATERIALS 


Sponge takes light 
to make steam 


A sponge-like device absorbs 
water and solar energy to 
generate steam efficiently 
(pictured). 

Gang Chen at the 
Massachusetts Institute of 
Technology in Cambridge and 


ECOLOGY 


Predictable patterns for coral-reef pest 


Modelling how ocean currents spread the larvae 
of coral-eating starfish around Australia’s Great 
Barrier Reef can help to identify areas that are 
prone to damaging epidemics of the pest. 

A team led by Karlo Hock of the University 
of Queensland in St Lucia, Australia, used 
a computer model to study the distribution 
of larvae of the voracious crown-of-thorns 
starfish (Acanthaster planci; pictured) in the 
Great Barrier Reef. Areas of the reef that are 


his colleagues placed a layer 
of graphite flakes on top of 
a piece of carbon foam. The 
foam floats in water, soaking 
it up and wicking the liquid to 
the graphite, which absorbs 
solar radiation. Thanks to the 
insulating foam, heat builds up 
in the porous graphite layer, 
causing the absorbed water to 
evaporate. 

The apparatus can trap 
85% of the incoming solar 
energy to generate steam when 
sunlight is focused to ten 
times its normal intensity. The 
device could help to purify 
and desalinate water in remote 
areas, the authors say. 
Nature Commun. 5, 4449 (2014) 
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Fighting ants 
make rare fluid 


An ionic liquid has been 
observed for the first time 
in nature — as a mixture of 
venom from two rival ant 
species. 

The tawny crazy ant 
Nylanderia fulva is displacing 
fire ants (Solenopsis invicta) 
in the southern United States, 
in part by detoxifying its 
enemy's venom using its own 
poison. Researchers led by 
James Davis of the University 
of South Alabama in Mobile 
show that the resulting brew 
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densely connected to each other through 
ocean currents were more likely to experience 
an outbreak, and to amplify it into a wider 
problem. The authors’ model also accurately 
identified the specific region where epidemics 
most often originate. 

The team suggests that careful study of 
reef connectivity could help to control future 
starfish outbreaks. 
J. Appl. Ecol. http://doi.org/tvs (2014) 


consists of ions instead of 
electrically neutral molecules. 
The finding suggests that 
ionic liquids, which are 
commonly used in industry, 
also have important biological 
functions. 
Angew. Chem. Int. Ed. http://doi. 
org/f2s5zj (2014) 


GEOMORPHOLOGY 


Beaches erode 
without storms 


Sea-level rises that are 
unrelated to major storm 
events could be eroding 
coastlines as much as 
hurricanes do. 


JURGEN FREUND/NATUREPL.COM 
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Weather and oceanographic 
processes that are not linked to 
storms cause sea levels to rise 
over weeks to months, but their 
effects have been overlooked 
in models of beach erosion. 

So Ethan Theuerkauf and his 
colleagues at the University of 
North Carolina’s Institute of 
Marine Sciences in Morehead 
City studied sediment cores 
from six sites along Onslow 
Beach on the US east coast 
after a year of frequent sea- 
level changes but no major 
storms. They compared these 
cores with those obtained after 
astorm year and found similar 
levels of erosion. 

The authors suggest that sea- 
level changes could become 
more frequent in this region 
because climate change is 
predicted to weaken the Gulf 
Stream, which can lead to these 
sea-level anomalies. 

Geophys. Res. Lett. http://doi. 
org/ttn (2014) 


Inflammation on 
the clock 


An internal clock regulates 
inflammation in mouse lungs. 

Symptoms of some human 
lung diseases, including 
asthma, tend to vary in 
severity according to the time 
of day. Andrew Loudon and 
David Ray at the University 
of Manchester, UK, and their 
colleagues found that immune 
responses to a bacterial toxin 
are regulated by a circadian 
clock in mouse lungs. The 
recruitment of immune cells 
called neutrophils and the 
expression of several immune- 
related proteins responded 
rhythmically to the toxin, 
with neutrophil recruitment 
peaking at dawn. 

Deleting a key ‘clock 
gene’ weakened responses 
to bacterial infection 
and reduced the effect of 
glucocorticoid steroids, 
which normally suppress 
inflammation. Chronic 
lung inflammation could be 
partly caused by circadian 
disruption, the authors say. 
Nature Med. http://dx.doi. 
org/10.1038/nm.3599 (2014) 


Warming from 
coupled climates 


Links between the climate 
over the North Pacific and the 
North Atlantic oceans could 
lead to abrupt climate change. 

Researchers have debated 
whether temperature and 
ocean fluctuations were in 
sync with each other during 
past climate changes. Summer 
Praetorius and Alan Mix of 
Oregon State University in 
Corvallis studied oxygen 
isotopes as a proxy for ocean 
temperature in three sediment 
cores from the Gulf of Alaska 
covering the past 18,000 years. 

By comparing the Alaska 
samples to cores from northern 
Greenland, the scientists 
found that climate variables 
such as temperature changed 
synchronously between about 
15,500 and 11,000 years ago — 
shortly before the end of the 
last ice age. 

The authors suggest that this 
link could have shifted heat in 
both oceans towards the poles 
at the same time, triggering 
abrupt climate change. They 
add that similar connections 
may be important for future 
warming. 

Science 345, 444-448 (2014) 


OPTICS 


Transistor uses 
single photons 


Two teams in Germany have 
built transistors that control 
light at the single-photon level. 

Transistors that switch 
light instead of electrical 
current can enable ultra- 
fast computing. But making 
optical transistors with ‘gain’ 
— when one photon affects 
many others to drive further 
switches — has been tricky 
because photons do not 
interact with each other. 

To overcome this problem, 
a team at the Max Planck 
Institute of Quantum Optics 
in Garching and a separate 
group at the University of 
Stuttgart passed a single 
photon through a cloud of 
ultracold rubidium atoms. 
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SOCIAL SELECTION 


Popular articles 
on social media 


Beef’s big impact on Earth 


Beef is suddenly big on social media, thanks to two recent 
papers investigating the global effects of livestock farming. 
They make the case that beef production has a bigger impact 
on greenhouse-gas emissions and on the use of nitrogen 

and water than does the production of pork and poultry, for 
instance. Tim Thomson, a physician and molecular biologist at 
the Molecular Biology Institute of Barcelona in Spain, tweeted: 
“Do not imitate Americans: Eat less beef and you will mitigate 
environmental costs of diet.’ But Jared Decker, a beef-cattle 
geneticist at the University of Missouri in Columbia, tweeted 
that cattle have a relatively small carbon footprint compared 

to other industry sectors, adding: “Wouldn't changing 
transportation & energy be more important?” 

Clim. Change http://doi.org/tvw (2014); Proc. Natl Acad. Sci. USA 


http://doi.org/tvx (2014) 


Based on data from altmetric.com. 
Altmetric is supported by Macmillan 
Science and Education, which owns 
Nature Publishing Group. 


The photon converted one 
atom into a type of large, 
excited particle called a 
Rydberg atom, which blocked 
the next photon from passing 
through. 

In the Stuttgart team’s 
transistor, one photon diverted 
another 10, whereas in the 
Max Planck device, a photon 
controlled a further 20. 

Phys. Rev. Lett. 113, 053602 
(2014); 113, 053601 (2014) 


Sperm are 
speedier in groups 


In the face of competition, 
sperm cells travel faster when 


> NATURE.COM 
For more on 
popular papers: 
go.nature.com/ajredg 


they move together in groups 
of an optimal size. 

A team led by Heidi Fisher 
at Harvard University in 
Cambridge, Massachusetts, 
studied rodent sperm cells 
under a microscope, and 
used a mathematical model 
to analyse their swimming 
behaviour. They found that, 
in comparison to solitary 
sperm or those in larger 
groups, intermediate-sized 
aggregates of six or seven 
sperm (pictured) tend to 
migrate the fastest, by taking a 
more direct path. 

Sperm cells from a sexually 
promiscuous species of 
deer mouse, Peromyscus 
maniculatus, were faster and 
more likely to form optimally 
sized clumps than were 
similarly shaped sperm from 
a monogomous sister species, 
Peromyscus polionotus. 

The results show how 
sexual selection can shape the 
evolution of cooperation. 
Proc. R. Soc. B 281, 20140296 
(2014) 


© NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 
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Malaria vaccine 
London-based 
pharmaceutical company 
GlaxoSmithKline has asked 
the European Medicines 
Agency to review its malaria 
vaccine, which would be 

the world’s first, under a 
programme designed to 
license medicines for use 
primarily outside Europe. 

A study published on 29 July 
indicates that the vaccine is 
45% effective in preventing 
infections for 18 months after 
its administration in children 
aged 5-17 months (The RTS,S 
Clinical Trials Partnership 
PLoS Med. 11, e1001685; 
2014). 


Agriculture body 
The US Department of 
Agriculture announced the 
creation of the Foundation 
for Food and Agricultural 
Research on 23 July. 
Agricultural researchers 
have long called for such 

as body as a source of extra 
funding; it will focus ona 
variety of issues, including 
plant and animal health 
and nutrition. Congress has 
provided US$200 million for 
the non-profit foundation, 
to be matched by external 
donations. The body will 
be guided by a board of 

15 directors. 


Nuclear risks 


The US nuclear-power 
industry must be more 
proactive in seeking and 
acting on information about 
potential threats to nuclear 
plants, concludes a report 
released on 24 July by the US 
National Research Council. 
The report to the US Congress 
sought lessons from the 2011 
meltdown at the Fukushima 
Daiichi nuclear plant in Japan. 
The industry, it says, should 
incorporate more modern risk 
assessments of earthquakes, 


Giant telescope gets boost from Brazil 


Brazil’s Sao Paulo Research Foundation 
(FAPESP) confirmed on 22 July that it will 

join the US$880-million Giant Magellan 
Telescope (GMT), to be located at the Carnegie 
Institution for Science's Las Campanas 
Observatory in the Atacama Desert, Chile. The 
25-metre instrument is one of three competing 
megatelescopes to be built in the next decade. 


tsunamis and solar storms 
that could cut plants’ electrical 
power, a key factor in the 
Fukushima accident. 


Drugs repurposed 
UK scientists will gain access 
to experimental drugs that 
have been deprioritized by 
pharmaceutical companies, 
the government said on 

21 July. The UK Medical 
Research Council (MRC) 

will fund projects to develop 
treatments from as-yet- 
unnamed compounds 

from seven companies. The 
programme follows a previous 
MRC-funded scheme to 
repurpose experimental 
drugs that AstraZeneca had 
suspended from development. 
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CDC safety group 

In the wake of biosafety 
mishaps in the past two 
months, the US Centers 

for Disease Control and 
Prevention (CDC) has formed 
an external laboratory-safety 
working group. The group will 
advise the agency’s head and 
its director of laboratory safety 
on corrective actions for the 
CDC’ labs. It will also identify 
training and oversight needs, 
and review biosafety protocols. 
The group’s first meeting will 
be in early August. See pages 
507 and 515 for more. 


Polish space agency 
Poland is set to create a national 
space agency, even though 

it is already a member of the 
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FAPESP will contribute $40 million to the 
project, opening up access to researchers from 
the state of Sao Paulo. But the foundation hopes 
to share the costs with the Ministry of Science 
and Technology of Brazil to allow astronomers 
from across the country to access the telescope 
when it begins operations in 2021. See 
go.nature.com/k3tsgv for more. 


European Space Agency (ESA). 
On 25 July, its parliament 
made the decision to establish 
the Polish Space Agency 
(POLSA), which will oversee 
space research and — the 
country hopes — give Polish 
researchers easier access to ESA 
projects and make it simpler to 
set up space-related companies 
and research centres. 


Long drive on Mars 
NASAss Opportunity rover 
has clocked up more than 

40 kilometres on Mars — 
breaking the record for 
long-distance driving on an 
extraterrestrial world. On 

28 July, the space agency 
announced that Opportunity 
had surpassed the 1973 record 


GIANT MAGELLAN TELESCOPE 


5 held by the Soviet Union's 

= Lunokhod 2 Moon rover. 

= Experts had been unsure 

= exactly how far the Soviet 

® rover had travelled (see Nature 


SOURCE: CHRONICLE/THE NEW YORK TIMES 


498, 284-285; 2013), but new 
calculations involving satellite 
imagery of its tracks show that 
it went about 39 kilometres. 
Opportunity landed on Mars 
in January 2004 and has been 
driving ever since. 


Ebola outbreak 


The largest recorded outbreak 
of Ebola virus has spread to 
Nigeria, which has reported 
its first case of the disease: 

an aeroplane passenger died 
in Lagos on 25 July after 
travelling from Liberia. The 
World Health Organization 
says that 672 people have so 
far died in the outbreak, which 
is concentrated in the West 
African countries of Liberia, 
Sierra Leone and Guinea. See 
page 520 for more. 


Scripps resignation 
Michael Marletta (pictured) 

is stepping down as president 
of the Scripps Research 
Institute in La Jolla, California, 
according to a 21 July statement 
from the institute. Marletta’s 
plan for a US$600-million 
merger between Scripps and 
the University of Southern 
California in Los Angeles drew 
faculty ire and a vote of no 
confidence this month. Scripps 


TREND WATCH 


The New York Times now writes 
about ‘climate change’ much more 


than it does about ‘global warming’ 


—amove that may reflect a 

shift in reporting emphasis away 
from temperature and towards 
effects such as rising sea levels 
and ocean acidification. Roger 
Pielke Jr, a science-policy expert 
at the University of Colorado 
Boulder, spotted the trend using 
the newspaper's Chronicle tool, 
released for public use on 23 July. 
The 2009 peak for ‘climate change’ 
coincides with the United Nations 
climate summit in Copenhagen. 


faces a $21-million budget 
deficit. See go.nature.com/ 
cvozom for more. 


Lab chief resigns 
The head ofa laboratory in 
which employees may have 
been accidentally exposed. 

to live anthrax has resigned. 
Michael Farrell stepped 

down as director of the 
Bioterror Rapid Response 

and Advanced Technology 

lab in Atlanta, Georgia, on 

22 July. The lab, part of the US 
Centers for Disease Control 
and Prevention, had failed 

to inactivate anthrax spores 
properly before sending them to 
labs with lower biocontainment 
levels. Farrell had already 

been reassigned from his post 
following the incident, which 
happened in June. 


FACILITIES 


Rocket shortfall 


NASA is short of money to 
carry out the first flight test of 
its heavy-lift rocket planned 
for December 2017, according 


to a report released on 23 July 
by the US Government 
Accountability Office. The 
space agency expects to 

spend nearly US$12 billion 
developing the vehicles for the 
first launch; the report says it 
would need some $400 million 
more. NASA engineers are 
gearing up to test parts of the 
rocket engines — modified 
from the space-shuttle 
programme — ata centre in 
Mississippi. 


Telescope reprieve 
On 21 July NASA announced 
a two-year reprieve for the 
Spitzer Space Telescope, 
which had been scheduled for 
shutdown following a review 
of the agency’s astrophysics 
mission priorities in May. 
Spitzer observes planets, 

stars and galaxies at infrared 
wavelengths. It launched in 
2003 and, after running out 
of coolant in 2009, began 
observing at slightly higher 
and less-optimal temperatures. 
Because of its Earth-trailing 
orbit, it can observe objects 
that other telescopes cannot, 
such as small asteroids 
following the planet. 


Hawaii green light 
Hawaii's planned Thirty 

Meter Telescope has cleared 

its last major legal hurdle, and 
construction can now begin on 
Mauna Kea. The state's Board 
of Land and Natural Resources 
gave approval for the telescope 


CLIMATE CHANGE IN THE MEDIA 


A publicly accessible online tool shows how use of the terms ‘climate 
change’ and ‘global warming’ has varied in The New York Times. 
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The New York Academy 
of Sciences holds a 
symposium in New York 
city to honour the work 
of Marshall Nirenberg, 
who was awarded a 
Nobel prize in 1968 for 
his part in deciphering 
the genetic code and 
protein synthesis. One 
session will see experts 
discuss legal, ethical 
and social issues related 
to applications of the 
genetic code. 
go.nature.com/wqlong 


observatory to sublease the 

site from the University of 
Hawaii on 25 July. The Office 
of Hawaiian Affairs, which 
advocates for the interests 

of native Hawaiians, who 
consider the mountain holy, 
had contested the arrangement 
in a petition filed on 7 July. It 
later withdrew the petition. 


Blavatnik awards 


The three winners of this year’s 
US Blavatnik National Awards 
were announced on 28 July. 
Neurobiologist Rachel Wilson 
from Harvard University in 
Cambridge, Massachusetts, 
was recognized for mapping 
the circuitry of fruit-fly brains. 
Adam Cohen, also from 
Harvard, won for his advances 
in imaging neural activity in 
real time. Marin Soljaci¢ of 
the Massachusetts Institute 

of Technology in Cambridge 
was rewarded for his work on 
electromagnetic phenomena, 
including wireless battery 
charging. Each person receives 
US$250,000 — the largest 
unrestricted cash prize for 
early-career scientists. The 
prizes are awarded annually 
by the Blavatnik Family 
Foundation and the New York 
Academy of Sciences. 


> NATURE.COM 
For daily news updates see: 
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NEWSIN FOCUS 


Drilling project 
prepares to drop sensors into 
heart of earthquake p.516 


bacteria contaminate 10% 
of cell experiments p.518 


Mycoplasma 


Funding, 
logistics and inertia hold 
up Ebola treatments p.520 


Blood 
samples hold promise for 
detecting cancer p.524 


An ‘inactivated’ sample of the anthrax pathogen Bacillus anthracis began growing at a US laboratory. 


INFECTIOUS DISEASES 


Biosafety controls 
come under fire 


Experts call for a stronger safety culture at secure sites after 
incidents involving anthrax and fluina US laboratory. 


BY DECLAN BUTLER 


ecent accidents involving deadly patho- 
Re at a leading laboratory in the United 
tates highlight the need for a major 
global rethink of biosafety controls, experts say. 
The Centers for Disease Control and Pre- 
vention (CDC) in Atlanta, Georgia, reported 
two accidents involving anthrax and the deadly 
H5N1 influenza virus. Biosafety professionals 
argue that such incidents show that without a 
strong culture of biosafety, even highly secure 
facilities are susceptible to errors that could 
place workers and the public at risk. 


Until now, biosafety has mostly been about 
physical biocontainment, meeting safety reg- 
ulations and following recognized standard 
operating procedures, says Tim Trevan, who 
is executive director of the International Coun- 
cil for the Life Sciences, a non-profit body in 
McLean, Virginia, that advises on biosafety 
policies. But organizations also need to focus 
on developing a stronger safety ethos, he says. 
“Thope that the accidents will trigger profound 
cultural change, not just at the CDC but at 
high-containment labs everywhere.” 

The incidents at the CDC occurred in 
March and June. In the first, a sample of a 
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low-virulence flu virus that was transferred to 
another laboratory had been accidentally con- 
taminated with the lethal H5N1 avian flu strain. 
The second incident involved the transfer of 
potentially inadequately inactivated anthrax 
bacteria from a biosafety-level-3 laboratory 
to a lab with a lower safety level that was not 
equipped to handle such a dangerous pathogen. 

The events have triggered a media and 
political storm, leading to considerable pres- 
sure on the CDC and other US labs to improve 
their practices. On 16 July, Thomas Frieden, 
the CDC director, was called to testify on the 
anthrax incident before a House of Representa- 
tives committee. “The fact that something like 
this could happen in such a superb laboratory 
is unsettling because it tells me that we need 
to look at our culture of safety throughout all 
of our laboratories,’ he said ahead of the hear- 
ing. “We are definitely looking at the implica- 
tions for laboratories around the country and 
around the world” 

Last week, the CDC announced the creation 
of an independent committee to review the 
agency's safeguards. Safety culture is among 
the topics the committee will discuss when it 
meets for the first time next month. 

The term ‘culture of safety’ is more than 
just jargon — management frameworks to 
aid organizational safety are well established 
in, for example, the airline and nuclear-power 
industries, says Trevan. Creating such a culture 
requires practices and training that are targeted 
at addressing risks in a structured manner, and 
constantly monitoring and improving perfor- 
mance. Yet researchers and oversight bodies 
all too often have a “checkbox culture’, he adds. 

This can result in a management mentality 
of “we dont care if the plan works, as long as 
you have a plan’, says Sean Kaufman, presi- 
dent of Behavioral-Based Improvement Solu- 
tions — a company in Woodstock, Georgia, 
that trains staff who work in biocontainment 
laboratories. He says that institutions are 
often reluctant to spend resources on improv- 
ing practices: “Typically, leadership will only 
invest so much in biosafety; the bare minimum 
required to keep them out of trouble and in 
compliance,’ 

Over the past decade, more attention has 
been paid to biosafety culture as the field has 
become increasingly professional. In 2008, 
the European Committee for Standardization 
(CEN) in Brussels adopted the first internation- 
ally recognized management framework for 
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organizational safety in facilities han- 
dling dangerous pathogens: CEN Workshop 
Agreement (CWA) 15793. This voluntary 
framework is currently being adapted to 
become an International Organization for 
Standardization (ISO) standard, which 
would give it worldwide recognition. 

The World Health Organization (WHO) 
recommends that organizations adopt 
CWA 15793, says Nicoletta Previsani, who 
was a former head of biosafety and labo- 
ratory biosecurity at the WHO and is now 
responsible for containment at its polio- 
eradication programme. “CWA 15793 
really is a major shift in thinking,” she says, 
adding that its implementation nonetheless 
requires considerable investment. 

The WHO has adopted the standard for 
oversight of the two laboratories holding 
the last stocks of the smallpox virus — one 
at the CDC in Atlanta, and the other near 
Novosibirsk in Russia. It has also specifi- 
cally recommended that facilities carrying 
out risky gain-of-function flu research, 
which increases the transmissibility, viru- 
lence or host range of viruses, be CWA 
compliant or equivalent. 

But wider uptake has so far been lim- 
ited. The CDC, for example, has not 
fully implemented the standard, and in 
a survey last year of 118 members of the 
European Biosafety Association, three- 
quarters of whom were biosafety profes- 
sionals, just 33% reported that they were 
using CWA 15793 in their institutions, and 
15% had never even heard of it. Reasons 
given for not implementing the standard 
included a lack of resources, its “exces- 
sive” nature and the availability of similar 
national standards. 

However, many organizations are using 
the CWA standard to improve biosafety 
management without going to the time and 
expense of seeking formal certification, says 
Gary Burns, a UK biosafety consultant who 
was vice-chair of the group that developed 
CWA 15793. He hopes that if it is adopted 
as an ISO standard, this will lead to greater 
formal and informal use. 

But such safety-management standards 
are not a “magic bullet’, cautions Maureen 
Ellis, executive director of the Interna- 
tional Federation of Biosafety Associations 
(IFBA) in Ottawa, Canada, because to 
be effective, all staff must buy into them. 
Researchers too often consider biosafety as 
an extra burden, “something they have to 
do because the rules say so’, she explains. 

The IFBA has sought funding to advocate 
for improved biosafety cultures in labora- 
tories, but funders are not interested, Ellis 
adds, partly because tangible outcomes are 
difficult to measure. “There is money for 
diagnostics and research, but ask for money 
for biosafety and it’s just not there, as it is 
lower priority,’ she says. m 
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GEOPHYSICS 


Researchers are set to drill a 1.3-kilometre borehole in a seismic fault near Whataroa, New Zealand. 


Project drills deep 
into coming quake 


Sensors in borehole at New Zealand seismic fault will peek 
under the surface of impending rupture. 


BY KATIA MOSKVITCH 


ing to drop a battery of sensors deep into 

a seismic fault to record the build-up and 
occurrence of a massive earthquake. 

An international team will drill a 1.3-kilo- 

metre hole in the Alpine Fault in New Zea- 

land, through which they will gather crucial 


i the first time, researchers are prepar- 
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data that could help to predict future quakes. 
The fault ruptures roughly every 330 years, 
triggering a quake of up to magnitude 8 
(K. R. Berryman et al. Science 336, 1690-1693; 
2012). The most recent earthquake was in 
1717, so the next one is expected any time now. 

“If we go on to record the next earthquake, 
then our experiment will be very, very special,” 
says Rupert Sutherland, a tectonic geologist at 


HANNAH SCOTT 


SOURCE: GNS SCIENCE 


LUKE MAHLER/UNIV. 
CALIFORNIA, DAVIS 


GNS Science, a government-run Earth-science 
organization in Lower Hutt, New Zealand, 
and one of the project’s leaders. “A complete 
record of events leading up to and during a 
large earthquake could provide a basis for 
earthquake forecasting in other geological 
faults.” 

The Alpine Fault, which runs for about 
600 kilometres along the west coast 
of South Island, marks the boundary 
between the Pacific and Australian plates 
(see ‘In the zone’). Every year, these plates 
slide past each other by about 2.5 centi- 
metres, building up pressure. Geologists 
are confident that the fault is “ready 
to break in its next earthquake’, says 
Sutherland — with a 28% chance ofa rupture 
in the coming 50 years. The Alpine Fault has 
been specifically selected for the drilling site 
because it is so late in this earthquake cycle. 

In 2011, Sutherland’s team completed a test 
phase — the Deep Fault Drilling Project 1 
(DFDP-1) — in which they sank two bore- 
holes, the biggest reaching 151 metres into the 
fault. In the next two weeks, work is begin- 
ning on the DFDP-2, which will drill a hole 
10 centimetres across and 1.3 kilometres deep 
at the same site near the village of Whataroa. 
At that depth, the team will reach the ‘crush 
zone where the two plates meet, and will be 
able to take measurements representative of 
conditions deep in the crust, where earth- 
quakes originate. 

The DFDP-2 project will cost about 
US$2 million, and is being backed by the Inter- 
national Continental Scientific Drilling Pro- 
gram in Potsdam, Germany, and the Marsden 
Fund of the Royal Society of New Zealand in 
Wellington. 

The first part of the experiment will involve 
collecting geological samples and inserting 
sensors into a shallow borehole to measure 
the temperature and pressure inside the fault. 
The hole will then be reinforced and deepened 
before instruments that are able to record key 
indicators of seismic activity — including 
images, sound, temperature and pressure — 
are lowered into the fault. The team hopes to 
complete all drilling and sensor-laying work at 
the borehole by early December. 

Data collected by the sensors will be fed 
into computer simulations to test theories 
of how faults rupture, and will help the team 
to develop detailed models of how the fault 
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During a fault rupture, the Pacific | 
plate moves southwest and up in 
relation to the Australian plate. 


The 1.3-kilometre borehole will 
be deep enough to reach the 
crush zone between the plates. 


IN THE ZONE 


The Alpine Fault project near Whataroa, New Zealand, } 
will measure temperatures, pressures and geological Tp ah 4 
properties between the Australian and Pacific tectonic 
plates. Sensors in a borehole will monitor the build-up 


to an expected magnitude-8 earthquake. 


behaves at different points in the earthquake 
cycle. One idea that will be tested is that large 
differences in groundwater pressures on either 
side of the fault zone could indicate whether a 
quake is imminent. 

“The fault appears to currently form an 
impermeable barrier, and it’s likely that time- 
dependent differences in groundwater pres- 
sure on either side of the fault play a role in 
governing earthquake nucleation processes 
and the radiation of seismic waves,’ says John 

Townend, a seis- 


“If we go on to mologist at Victoria 
record the next University of Wel- 
earthquake, lington, who is part 
then our of the project. 

experiment will The work will 
be very, very also help to improve 


understanding of 
plate-boundary 
mechanics and seismic hazards, says David 
Boon, a geologist at the British Geological 
Survey in Cardiff, who is not involved in the 
research. “This drilling will underpin the sci- 
ence of modelling stress build-up in the crust 
and, importantly, stress release — which can 
cause large destructive earthquakes, and sec- 
ondary hazards such as tsunami, landslides and 
liquefaction’, in which soil behaves like liquid. 

Researchers have used deep-drilling 


special.” 
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data for models before — but only in the 
aftermath of a quake. “To develop computer 
simulations for how earthquakes happen, 
information about initial conditions within 
the geological fault that is being ruptured is 
essential,’ says Sutherland. “After our experi- 
ment, realistic data — based on observation 
— can be used to construct models, so they 
will have much more value” 

The only previous major attempt to peek 
inside an active fault is the San Andreas Fault 
Observatory at Depth (SAFOD), a 3.2-kilo- 
metre borehole near Parkfield, California. But 
this was done at a ‘creeping’ section of the fault, 
which experiences regular but small quakes 
rather than infrequent large ruptures. 

Although not as deep as SAFOD, the 
DFDP-2 is calling on the US project’s expe- 
rience, says Cliff Thurber, a seismologist at 
the University of Wisconsin—Madison and a 
participant in the DFDP-2. In particular, the 
team hopes to learn from SAFOD’s technical 
setbacks, which included instruments break- 
ing because of the enormous heat and pressures 
deep underground. Thurber is also already 
thinking about a next borehole project, which 
he would like to see get even closer to an earth- 
quake epicentre. “DFDP-2 is a great project, but 
my hope would be that there will be a DFDP-3 
to reach deeper sometime soon,” he says. m 
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CELL CULTURE 


Mycoplasma (orange) can surround and infect cells and invalidate results of gene-expression studies. 
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Contamination 
hits cell work 


Mycoplasma infestations are widespread and costing 
laboratories millions of dollars in lost research. 


BY EWEN CALLAWAY 


ohn Hogenesch saw the anguished look 

on his technician's face and knew instantly 

that her experiments had gone haywire; he 
also had a pretty good idea of why. “Check 
your culture for Mycoplasma contamina- 
tion,’ he advised. The bacterium is notorious 
for infecting cell cultures, and had indeed 
compromised her experiments. 

In fact, the problem is widespread. 
Hogenesch, a genome biologist at the Uni- 
versity of Pennsylvania in Philadelphia and 
his colleague Anthony Olarerin-George have 
found that more than one-tenth of gene- 
expression studies, many published in lead- 
ing journals, show evidence of Mycoplasma 
contamination’. The infestations are under- 
mining research findings and wasting huge 
amounts of money, Hogenesch says. 

He should know. His lab quickly over- 
came an infestation last year, but a previous 
plague cost it some US$100,000 and a year of 
research. Mycoplasma takes hold quickly, he 
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says. “All it takes is one person not to check, 
and — bam — you have it.” The bacterium 
often comes from lab workers, and is not 
killed by the antibiotics typically used to rid 
cell cultures of contaminants. And unlike 
many other microorganisms, which turn the 
growth medium turbid, Mycoplasma leaves 
no visible signs of its presence. 

Mycoplasma is a long-standing problem. 
The bacterium plagued early cultures of 
HeLa cells, a widely used human cell line 
established in the 1950s. Surveys of individual 
collections have long served as a warning: a 
1993 study’, for instance, found Mycoplasma 
in 15% of 20,000 cultures from US Food and 
Drug Administration labs. Companies and 
centres that distribute cells and reagents are 
now fastidious in screening for it. 

To get a global perspective on the problem, 
Hogenesch and Olarerin-George looked 
for stretches of Mycoplasma DNA in RNA- 
sequence data from more than 9,000 samples 
— collected during experiments done between 
2012 and 2013 to measure gene expression in 
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cultured mammalian cells. A total of 11% of 
the samples were found to contain Mycoplasma 
DNA at levels indicative of contamination. 

Some of the studies with the highest levels 
were published in leading journals such as 
Cell, Nature and Proceedings of the National 
Academy of Sciences, says Hogenesch. Con- 
tamination does not necessarily invalidate 
those findings, but the bacterium can influ- 
ence the expression of hundreds of genes and 
hinders cell growth by competing for nutri- 
ents. In one particularly contaminated data 
set, of individual lymphoma cells, Hogenesh 
and Olarerin-George identified 61 genes 
whose levels were altered by Mycoplasma. 
Their work is available on the bioRyiv pre- 
print server’, and they plan to submit their 
findings to a peer-reviewed journal soon. 

If more than one-tenth of cell cultures are 
contaminated, the costs in wasted time and 
resources, such as repeating experiments and 
replacing cells, could run into hundreds of 
millions of dollars, says Hogenesch. In 2013, 
for example, the US National Institutes of 
Health spent about $3 billion on research that 
uses cell lines, and Hogenesch estimates that 
about one-third of his lab costs go on tissue 
culture. 


CONTAMINATION CRACKDOWN 
Contamination is a result of “sloppy cell- 
culture work’, says Hans Drexler, a physi- 
cian-scientist at the German Collection of 
Microorganisms and Cell Cultures in Braun- 
schweig. “Mycoplasma don't fall from the sky,” 
he says. “They are introduced into the cell cul- 
ture by people. He also is not surprised that so 
many cell cultures are contaminated. A similar 
percentage of the human leukaemia and lym- 
phoma cell lines his lab received from other 
researchers between 2010 and 2013 tested 
positive for Mycoplasma, he says. But this is 
an improvement: one-quarter of such cultures 
were tainted in the early 1990s, he found’. 

Drexler believes that Mycoplasma 
contamination persists because of a “black 
market” in cell lines — researchers often share 
cultures in violation of materials-transfer 
agreements. He estimates that 10% of the cell 
lines he receives from these sources are con- 
taminated, compared with none from official 
suppliers. To solve the problem, he urges labs 
to spend the extra money on cell lines from 
reputable sources, and test those they have for 
contamination. 

“There’s no magic twenty-first-century 
bullet that’s going to kill these things,” Hogen- 
esch says. “We have to be continuously vigi- 
lant, clean up the cultures that have them, and 
destroy the bacteria altogether.” m 


1. Olarerin-George, A. O. & Hogenesch, J. B. Preprint 
at bioRyiv http://dx.doi.org/10.1101/007054 
(2014). 

2. Rottem, S. & Barile, M. F. Trends Biotechnol. 11, 
143-151 (1993). 

3. Drexler, H. G., Uphoff, C. C., Dirks, W. G. & MacLeod, 
R.A. F. Leuk. Res. 26, 329-333 (2002). 
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FDA debates trial-data secrecy 


US drug regulator weighs up merits of disclosing preliminary results. 


BY HEIDI LEDFORD 


espite a trend towards increased 
D transparency in clinical-trial data, the 

US Food and Drug Administration 
(FDA) is asking whether there are times when 
participants and researchers should be kept in 
the dark. As pharmaceutical companies push 
for studies that first justify a drug's approval, 
then monitor safety once it reaches the market, 
the agency fears that publicizing the early data 
could bias the final results. 

In raising the matter, the FDA could energize 
the debate about a long-standing clinical 
conundrum, says Iain Chalmers, coordinator 
of the James Lind Initiative, a group based in 
Oxford, UK, that aims to improve clinical trials. 
“There hasnt been much discussion about this,” 
he says. “There needs to be much more.” 

On 11 August, the FDA will hold a public 
hearing in Silver Spring, Maryland, to discuss 
situations in which preliminary results from 
clinical trials should be kept confidential. The 
FDA is obliged to release a summary of the data 
that it uses to approve a drug. But the public 
rarely sees the data given to safety committees 
to decide whether a trial should continue. Even 
if those data are not definitive but lean one way 
or another, making them public may spook 
study participants or bias investigators towards 
a particular outcome, the agency fears. 


SCIENCE OVER SUBJECTS 

The practice of confidentiality has been 
debated by researchers and ethicists for some 
time, and FDA memos arguing in favour of 
withholding some interim data have not con- 
vinced everyone. Although the memos have 
pointed out many possible negative conse- 
quences for a trial if such data are divulged, 
they have ignored the ethical ramifications 
of keeping information from participants, 
says Michael Carome, director of the health- 
research group at Public Citizen, a non-profit 
consumer advocacy group in Washington DC. 
“The agency wants to get an answer to scien- 
tific questions,” he says. “The question is: in 
order to get their wish, are they perhaps put- 
ting human subjects at risk?” 

Changes to the drug-approval process have 
made that question trickier than ever. In the 
past, sponsors rarely submitted interim data 
to the FDA. But in 2007, researchers found 
signs that a popular diabetes drug may have 
been increasing deaths due to cardiovascular 
events such as heart attacks or strokes. As a 
result, the agency began to demand large, pro- 
longed safety trials of some diabetes drugs (see 


MUM'S THE WORD 


When approving the diabetes 
medication alogliptin, the US Food and 
Drug Administration deviated from its 
usual practice of publicly releasing the 


supporting data. 


CUE arate The Japanese 


firm Takeda Pharmaceuticals applies to 
the FDA for approval of its diabetes drug 
alogliptin. 


WA ae aa wali The FDA issues 
detailed standards for cardiovascular- 
safety tests of new diabetes drugs. 


28 AUGUST 2009 xterra 
launch clinical trial to test 
cardiovascular safety of alogliptin. 


Ae | PA The FDA receives 
interim data from safety trial. 


PUA are The FDA 
approves alogliptin but does not release 
interim data. 


3 OCTOBER 2013 faheieswicrey 
safety trial are published and show no 
evidence of cardiovascular risk. 


USS ar The FDA is holding 


a public meeting to discuss disclosure 
of interim clinical-trial data. 


‘Mum's the word’). Pharmaceutical companies 
are typically allowed to market those drugs 
once they have showed that it raises the risk 
of cardiovascular events by no more than 
80% relative to the control group; then they 
are asked to do a post-approval study dem- 
onstrating that the drug boosts that risk by no 
more than 30%. 

Increasingly, companies are petitioning to 
combine the two studies into one large trial, 
and use interim data to clear the first hurdle. 
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One such case came to the FDA in 2011, when 
it evaluated alogliptin, a diabetes drug made 
by Takeda Pharmaceuticals in Osaka, Japan. 
Interim analyses of the drug’s heart risks were 
conducted once 81 cardiovascular events had 
been seen in study participants, says William 
White, a specialist in preventive cardiology at 
the University of Connecticut School of Medi- 
cine in Farmington, who led the study. Those 
data showed that the drug did not greatly affect 
the rate of cardiovascular events, White says, 
but could have been misinterpreted to suggest 
that the drug actually lowered the risk. 


EARLY RELEASE 

Rather than releasing those data when it 
approved alogliptin in January 2013, the FDA 
simply announced that the findings showed 
that the trial was safe enough to proceed. Ina 
March 2013 memo, Mary Parks, head of endo- 
crinology products at the FDA’s Center for 
Drug Evaluation and Research in Silver Spring, 
argued that the secrecy was necessary so that 
long-term safety data could be obtained in a 
timely fashion. 

White, who says that even he did not see 
the data until the study was finished, says that 
secrecy was key to successful completion of the 
trial because investigators might have refused 
to put patients on placebo had they seen the 
interim data. 

The final results, published in October, 
confirmed that the drug had no significant 
effect on cardiovascular risk (W. White et al. 
N. Engl. J. Med. 369, 1327-1335; 2013). 

The release of interim results might also 
prompt patients to abandon a trial, but that 
should be their choice, says Richard Lilford, 
chair of public health at the University of 
Warwick, UK. He argues that trial designers 
too often default to secrecy, and risk sacrificing 
their obligation to participants in the process. 
Instead, he advocates that the data be shared 
from the start. “Among the trial fraternity, this 
idea is terribly unpopular,’ he says. “They think 
that clinical trials must run until they've got a 
clear answer.” 

Paul Armstrong, a cardiologist at the 
University of Alberta in Canada, has served 
on more than 30 safety boards and says that it 
is standard to keep interim data confidential. 
But sometimes, he says, boards do decide that 
the benefits of revealing the data outweigh the 
risks. “We always ask ourselves, ‘could we go 
and get consent for the next patient and feel 
confident they were adequately informed 
about participating in the trial?’ That is the 
bottom line.” m 
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MAJOR OUTBREAKS 


With more than 800 confirmed cases so far, the current Ebola virus outbreak is the largest in 
recorded history. After the first cases were reported in Guinea in March, the virus spread to 
neighbouring Liberia and Sierra Leone. Previous outbreaks were largely in central Africa. 
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INFECTIOUS DISEASES 


AFRICA y 


Ebola treatments 
caught in limbo 


Logistics and lack of funds keep experimental drugs and 
vaccines from being used in Africa outbreak. 


BY SARA REARDON 


edical relief workers fighting a 
Meee Ebola outbreak in West 

Africa have not been welcomed with 
open arms. Death was all that the hazmat-suited 
visitors seemed to bring. Most patients who 
entered the makeshift hospitals died, their fami- 
lies forbidden to handle their bodies. Rumours 
flew that these newcomers were harvesting 
organs and conducting fatal experiments. 

So people scattered, making a bad situation 
worse. The outbreak, the biggest recorded 
in Ebola history, has so far killed more than 
670 people in West Africa and is thought to 
have infected about 400 more, and it shows no 
sign of abating (see ‘Major outbreaks’). 

Doctors have no cure to offer the infected. 
Understaffed clinics must make do with isolat- 
ing infected people, finding and quarantining 
their families, and educating the public on how 
to avoid spreading the disease. Although several 
vaccines and treatments for Ebola do exist, they 
are stalled in various stages of testing owing to 
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a lack of funding and of international demand. 
Even if they did move forwards, it would be 
years rather than months before the measures 
would reach the people in need. 

For researchers such as Heinz Feldmann, a 
virologist at the US National Institute of Allergy 
and Infectious Disease (NIAID) in Hamilton, 
Montana, the situation seems like it could have 
been avoided. In 2005, he published a vaccine 
platform based on vesicular stomatitis virus 
(VSV) that has since yielded an Ebola vaccine 
that is effective in macaques (T. W. Geisbert 
et al. PLoS Med. 2, e183; 2005). But money is 
not available to take the next step — testing the 
vaccine’ safety in healthy humans, says Feld- 
mann. Compared with malaria or HIV, “Ebola 
is just not that much ofa public-health problem 
worldwide’, he says, and consequently draws 
little interest from public or private funders. 

“What works for Ebola is good old-fash- 
ioned public health,” says Thomas Frieden, 
director of the US Centers for Disease Con- 
trol and Prevention in Atlanta, Georgia. “It 
would be great to have a vaccine, but it’s not 
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easy to do and not clear who youd test it on.” 

The VSV vaccine seems to be a promising 
option because it can be used either preventively 
or just after a person is infected. In 2009, it was 
used on a German lab technician who had acci- 
dentally pricked herself with a needle carrying 
Ebola virus. Although it is unclear whether she 
was ever infected, the technician survived and 
suffered no ill effects from the vaccine. “Every- 
body in my lab would volunteer to take the vac- 
cine,” says Thomas Geisbert, a microbiologist at 
University of Texas Medical Branch in Galves- 
ton who is also working on the medicine. 

The NIAID Vaccine Research Center in 
Bethesda, Maryland, has developed a vaccine 
that is carried by a chimpanzee adenovirus, 
similar to the virus that causes the common 
cold. The institute hopes to begin testing in 
healthy people as early as September. Barney 
Graham, deputy director of the research cen- 
tre, says that the institute is talking with the 
Food and Drug Administration (FDA) to 
speed up the approval process, a position that 
is strengthened by the outbreak in West Africa. 

Biotechnology companies are also develop- 
ing treatments at a pace that could be acceler- 
ated. Mapp Biopharmaceutical in San Diego, 
California, is testing combinations of mono- 
clonal antibodies that target the virus, and also 
hopes to begin human trials soon. And with 
US$140 million from the US Department of 
Defense, Tekmira in Burnaby, Canada, is test- 
ing a treatment called TKM-Ebola, which uses 
small RNA molecules to bind the virus and 
target it for destruction. The company began 
testing the vaccine in humans in January, but 
on 3 July, the FDA put the study on hold until 
the company could provide more data on how 
the treatment works. Tekmira says that it is 
confident it will be able to restart the trial soon. 

The timing of the outbreak is “unfortunate”, 
says Armand Sprecher, a public-health special- 
ist at Médecins Sans Frontiéres (also known as 
Doctors Without Borders) in Brussels. “If this 
had happened a year or two from now maybe 
wed be in a better position” 

A treatment could be approved by the FDA 
on a ‘compassionate use’ basis, but that process 
would have to mesh with a host country’s rules. 
“A country has to request these things; it’s not 
something we can force on them,” says Gene 
Olinger, a virologist at the contract research 
organization MRIGlobal in Frederick, Mary- 
land. “We have to follow their internal policies 
for drug development and for testing” m 


CORRECTION 

In the News Feature ‘Hello, Governor’ (Nature 
511, 402-404; 2014), the number of 
co-authors for the consensus statement was 
given as 13 instead of 14. And the title of the 
report omitted the word ‘scientific’. Finally, 

it was Governor Brown, not Elizabeth Hadly, 
who delivered the report to political leaders. 


SOURCE: WHO 
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Rivers on the run 


As the United States destroys its old dams, species 
are streaming back into the unfettered rivers. 


BY RICHARD A. LOVETT 


ust outside the small town of Stabler in Washington, hydrologist 

Bengt Coffin surveys a mountain river that he helped to revive from 

a decades-long coma. 

Today, the clear waters of Trout Creek run fast and cool between banks 
covered in young alder trees. But just five years ago, an 8-metre-high con- 
crete wall blocked the river at this site. The dam and the reservoir behind 
it had tamed the river and made it difficult for endangered steelhead trout 
(Oncorhynchus mykiss) to reach their spawning grounds upstream. 

In 2009, Coffin led the US Forest Service effort to remove the dam, and 
Trout Creek has since regained the look ofa young river. Vegetation has 
covered the scars left by the dam and reservoir, and steelhead and other 
species have started to rebound. 

The revival of Trout Creek is part of a growing trend in the United 
States. About half of the nation’s roughly 85,000 known dams no longer 
serve their intended purposes, and an increasing number are being 


removed. Around 1,150 have gone so far, mostly 
in the past 20 years, according to a tabulation by 
the watchdog group American Rivers in Wash- 
ington DC. Inan era when many countries are 
still building dams, the United States is taking 
them out. “It used to be a crazy idea. Now it’s accepted,” says Amy Kober, 
director of communications for American Rivers. 

Most of the demolished structures were lower than 5 metres, but in 
the past few years, projects in the Pacific Northwest have removed much 
taller ones. At the top end of the spectrum, the US 
National Park Service is dismantling the 64-metre- 
high Glines Canyon Dam, the largest of a pair of 
big dams on Washington's Elwha River. Many of 
the larger dams were removed because their opera- 
tors decided that it was too costly to bring the old 


An excavator chips 
away at Washington’s 
Glines Canyon Dam in 
2012. 


2 NATURE.COM 
For avideo and 
slideshow of dam 
removals, see: 
go.nature.com/mlldiz 
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structures in line with modern safety and environmental requirements. 

The power companies actions are boons for fish advocates who seek 
to restore populations of endangered species in the rivers. The dam- 
elimination trend has also provided an unanticipated research opportu- 
nity, because the projects have used diverse approaches to minimize the 
damage caused by unleashing huge floods of water and decades of accu- 
mulated sediment. Some efforts take a slow path, restoring river flow over 
months or years. Others use explosives and other engineering techniques 
to drain reservoirs within hours. 

Data are still preliminary, but they suggest that both approaches can 
bring rapid benefits — not just to fish, but also to the habitat on which they 
depend. The rivers are rebounding at the sites studied so far, says Amy 
East, a geomorphologist with the US Geological Survey (USGS) in Santa 
Cruz, California. “We've seen a lot of resilience.” 


OUT OF COMMISSION 

At Trout Creek, Coffin and his colleagues decided to take the cautious 
route when removing the ageing Hemlock Dam. Built in 1935, the struc- 
ture provided power and irrigation for a nearby tree nursery that shut 
down in 1997. It had a fish ladder to allow animals to bypass the dam 
and swim upstream, but it was poorly built by modern standards and the 
number of fish using it had steadily declined. 

A bigger concern was the reservoir, which had been steadily filling in 
with silt. By the time the dam was dismantled, the reservoir had become 
so shallow that it was possible to wade all the way across, says Coffin, wav- 
ing a hand at mid-thigh level to show the depth of the water. In the mid- 
summer sun, temperatures in the water could reach 26°C — too warm 
for steelhead, he says. 

When the Forest Service decided to remove the dam, it was particularly 
concerned about the mud, sand and gravel that had built up in the reser- 
voir. Coffin and others worried that flooding the river with all that sedi- 
ment would harm the steelhead below the dam in Trout Creek. “All of our 
baby fish are down there,’ Coffin says. “We didn't want to decimate them? 

The solution was to divert the river into a big pipe and then hire a fleet 
of dumper trucks to carry away the exposed sediment. In the process, the 
workers rediscovered the creek’s original channel through the reservoir 
bottom and reinforced its banks with logs to stop them from eroding. 

All those efforts seem to have worked. When 
water was first allowed to flow back through the 
old reservoir bottom, it initially ran muddy. 
But just seven hours later, Coffin'’s team docu- 
mented the first steelhead venturing into the 
new channel above the old dam site. “It was 
that clear,” he says. 

Since then, the number of steelhead in the 
river and its tributaries has more than dou- 
bled, says fisheries biologist Patrick Connelly 
at the Columbia River Research Labora- 
tory in Cook, Washington, although he notes that fish populations are 
variable enough that it will take several years to know whether the trend 
will continue. 

Returning steelhead are not the only signs of success. Just above the old 
dam site, Coffin winds his way through patches of alder trees that were 
planted after the dam was removed, then crosses a rocky beach to the river. 
The rounded stones range from the size of potatoes to loaves of bread, and 
make for tricky footing. But Coffin is thrilled to see them because none 
of these ankle-breakers was here when the dam was first taken out. “All 
of this washed in,” he says. 

The cobbles provide nesting spots for the trout and a habitat for the 
insects that the fish eat. “People pay attention to the big animals,’ Coffin 
says, “but the bugs are an important part of the system.” Reaching into 
the water, he plucks out a couple of rocks, turns them over and points out 
six types of insect clinging to the underside, including caddisfly larvae 
and a stonefly. “The year after the dam was removed, these wouldn't 
have been here,” he says with satisfaction. 

Elsewhere in the Pacific Northwest, teams opted for much more 
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“People pay attention to 
the big animals, but the 
bugs are animportant 
part of the system.” 


extreme measures to remove the 14-metre-tall Marmot Dam on Oregon's 
Sandy River in 2007 and the 38-metre-tall Condit Dam on Washington's 
White Salmon River in 2011. 

The dams, both nearly a century old, were too big to take the same 
approach as at Trout Creek, where it had cost nearly US$1 million to cart 
away 42,000 cubic metres of sediment. Marmot had nearly 20 times more 
sediment and Condit had double that of Marmot. Because it would be 
too expensive to dig out that material and carry it away, project managers 
opted for a more radical approach, colourfully described as “blow and go’, 
in which the dams were removed quickly, says Gordon Grant, a research 
hydrologist at the Forest Service's Pacific Northwest Research Station in 
Corvallis, Oregon. 

The results were impressive — but very different at the two sites. At 
Marmot, the sediment contained an equal mixture of sand and gravel. 
Once exposed to river action, it eroded out relatively quickly but sedately, 
with about half of it gone within 8 months. Researchers were surprised to 
find that the fish seemed little affected — the first curious salmon poked 
its nose back towards the former dam site within a day. 

At Condit, the sediment contained a higher proportion of fine-grained 
material: 35% mud, 60% sand and just 5% gravel. The result was predict- 
able in retrospect, but nobody anticipated it. 

When engineers blew open a hole at the bottom of the dam, a jet of 
black liquid shot out as if from a giant fire hose. Instead of the expected 
flood of water, what came out was more like a mudflow, as waterlogged 
sediment from the reservoir slumped into the rapidly dropping water, 
then blasted downriver ina slurry that was as much as 28% sediment 
by volume. The reservoir lost its water and much of its sediment load in 
three hours. “It was almost like a volcanic event,’ says Jon Major, a geo- 
morphologist at the USGS’s Cascades Volcano Observatory in Vancouver, 
Washington. The 5-kilometre-long stretch of river between the dam and 
its confluence with the Columbia River temporarily became a muddy 
wasteland. With this kind of approach, says East, the slug of sediment 
wipes out everything, but the river can start recovering much sooner. 

The National Park Service took a much more conservative approach 
to removing two large dams on the Elwha River, because the stakes 
were higher. The upstream portions of the Elwha drain more than 
100 kilometres of pristine habitat on the north side of Washington's 
Olympic National Park. A river that large 
produces a lot of sediment: an estimated 
18 million cubic metres was expected to 
escape from behind the dams, says Jason 
Dunham, an aquatic ecologist at the USGS 
office in Corvallis. That is the equivalent of 
filling eight typical American-football stadi- 
ums. And before the dams cut the number 
of salmon returning each year to around 
10,000, the Elwha supported hundreds of 
thousands of fish. 

Unwilling to risk the blow-and-go approach on both dams, engineers 
opted for a compromise. They quickly removed the lower, 32-metre-high 
Elwha Dam, which contained only about one-sixth of the total sediment. 
But the upstream Glines Canyon Dam, which is twice as big, is coming out 
ina series of steps that have so far lowered it to a 9-metre stub of its former 
self. East compares the method to deciding whether to uncover a wound 
quickly or gradually. The approach on the Elwha, she says, is like “pulling 
the Band-Aid off slowly, over the course of three years”. 

The good news in these giant projects is that scientists have not seen 
any serious harm from the feared releases of sediment. Instead, the rivers 
have proved unexpectedly efficient at flushing the worst of the mud down- 
stream towards the sea, rather than letting it accumulate in river-choking 
mudflats. “It was not the big catastrophe people thought,” East says. 

Emily Stanley, a river ecologist at the University of Wisconsin-Madison 
who has studied dam removals for more than a decade, agrees that it is 
hard to think of one that had “catastrophically awful” results. (The one 
exception, she says, was an event in the 1970s, when the demolition of 
a dam on the Hudson River allowed sediment containing high levels of 
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toxic chemicals called polychlorinated biphenyls (PCBs) to escape from 
the reservoir and flow downstream.) 

Data on the recent dam removals suggest that the fish are now coming 
back to the unfettered rivers. At Condit, fish were seen returning within 
weeks of the explosion. Two years later, the total exceeded 5,500, including 
steelhead and spring Chinook (Oncorhynchus tshawytscha), which had 
been effectively extirpated from the river, says Jody Lando, a quantitative 
ecologist with Stillwater Sciences in Portland, Oregon, who reported her 
results in May at an aquatic-sciences meeting in Portland. 

Even on the Elwha, where the Glines Canyon Dam still impedes the 
river, East says that hundreds of salmon have been seen spawning in the 
lower dam’s former lake bed. “That hasn't happened in over a hundred 
years,’ she says. 

In part, these successes may reflect the fact that the Pacific Northwest is 
a landscape built by geological disturbances — volcanic outbursts, land- 
slides and floods. Local wildlife has had to adapt to such upheavals, and 
salmon do that by not always returning to the precise stream of their 
birth. “There's a fair amount that stray,” says East. It is those strays that 
repopulate any previously inaccessible habitat. 

But other parts of the United States have also seen dramatic fish returns. 
On south-central Wisconsin's Baraboo River, the removal of a string of 
dams has allowed sturgeon to reach their former spawning grounds. 
And in New England, the destruction of two dams 7-9 metres high on 
Maine’s Kennebec River and one of its tributaries has allowed Atlantic 
alewives (Alosa pseudoharengus) to repopulate 100 kilometres of previ- 
ously blocked-off river. In 1999, before the first dam was taken out, no 
alewives were recorded in the upper part of the watershed, says Serena 
McClain, head of river restoration for American Rivers. By 2013, the 
annual run had rebounded to around 3 million. 


QUAKE CONCERNS 

The next big structure destined for retirement is the 32-metre-tall San 
Clemente Dam on California's Carmel River. The 93-year-old dam, which 
was originally built to provide drinking water, is coming out because of 
concerns over its safety during an earthquake. And there are expensive 
homes that could be flooded if even modest amounts of sediment were 
to escape and raise the stream bed, so the dam-removal plan seeks to 
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avoid that, says East. Instead, the $84-million project will cut a notch in 
a ridge near the upstream end of the reservoir, then divert the water into 
a nearby drainage that rejoins the original river downstream of the dam. 
“It’s a major engineering feat,’ she says. 

Researchers say that the surge in large dam removals in the past ten 
years has offered valuable insight into how rivers and their ecosystems 
respond to letting the water flow freely. But because every river and dam 
is different, it is hard to draw simple lessons that will apply in all situations, 
says Jim Pizzuto, a fluvial geomorphologist at the University of Delaware 
in Newark. 

Still, the projects have shown that fish are remarkably adept at finding 
their way back. “If you un-build it, it seems like they will come back,” 
says Grant. 

At least, that is the sense emerging from the limited data so far. 
Researchers are struggling to get detailed statistics on fish recovery — 
partly because removal projects tend to be planned according to engineer- 
ing standards, not ones focused on fish and other river residents. And 
when fish assessments are done, they tend to be carried out by various 
state and federal agencies that share data only to a limited degree. “A lot 
of studies wind up on someone's computer, somewhere,” says McClain. 

But that may be changing because ecological considerations are increas- 
ingly part of dam-removal projects. A case in point is Maine's Penobscot 
River, where a $62-million public-private partnership is buying dams and 
removing them to provide better access for fish to more than 1,600 kilo- 
metres of the river and its tributaries. 

For a country once so bent on taming rivers, attitudes are quickly 
evolving. At the site of the former Condit Dam, a couple pulls into the 
car park and walks to a spot overlooking the water. “I come from a dam- 
building family,’ says the man. “My father used to build things like this 
down in California — the Feather River, the Rubicon, the Yuba. I helped.” 

He pauses. 

“A hundred years is a great thing, isn’t it? Now we're busily employing 
people to undo what our ancestors screwed up.’ He stares silently for a 
momentat the ribbon of river, flecked with foam, 40 metres below. “It’s 
a great thing” = 


Richard A. Lovett is a freelance writer in Portland, Oregon. 
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WRITTEN IN 


BLOOD 


DNA circulating in the bloodstream could guide cancer treatment — 
if researchers can work out how best to use it. 


n 2012, Charles Swanton was forced to 
confront one of cancer’s dirtiest tricks. 
When he and his team at the Cancer 
Research UK London Research Institute 
sequenced DNA from a handful of kid- 
ney tumours, they expected to find a lot 
of different mutations, but the breadth 
of genetic diversity within even a single 
tumour shocked them. Cells from one end 
differed from those at the other and only one- 
third of the mutations were shared through- 
out the whole mass. Secondary tumours that 
had spread and taken root elsewhere in the 
patients’ bodies were different again’. 

The results confirmed that the standard prog- 
nostic procedure for cancer, the tissue biopsy, 
is woefully inadequate — like trying to gauge a 
nation’s behaviour by surveying a single street. 
A biopsy could miss mutations just centimetres 
away that might radically change a person's 
chances for survival. And although biopsies 
can provide data about specific mutations that 
might make a tumour vulnerable to targeted 
therapies, that information is static and bound 
to become inaccurate as the cancer evolves. 

Swanton and his team laid bare a diversity 
that seemed insurmountable. “I am still quite 
depressed about it, if!’m honest,” he says. “And 
if we had higher-resolution assays, the com- 
plexity would be far worse.” 

But researchers have found ways to get 
a richer view of a patient’s cancer, and even 
track it over time. When cancer cells rupture 
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and die, they release their contents, including 
circulating tumour DNA (ctDNA): genome 
fragments that float freely through the blood- 
stream. Debris from normal cells is normally 
mopped up and destroyed by ‘cleaning cells’ 
such as macrophages, but tumours are so 
large and their cells multiply so quickly that 
the cleaners cannot cope completely. 

By developing and refining techniques for 
measuring and sequencing tumour DNA in 
the bloodstream, scientists are turning vials 
of blood into ‘liquid biopsies’ — portraits of 
a cancer that are much more comprehensive 
than the keyhole peeps that conventional 
biopsies provide. Taken over time, such blood 
samples would show clinicians whether treat- 
ments are working and whether tumours are 
evolving resistance. 

As ever, there are caveats. Levels of ctDNA 
vary a lot from person to person and can be 
hard to detect, especially for small tumours 
in their early stages. And most studies so 
far have dealt with only handfuls or dozens 
of patients, with just a few types of cancer. 
Although the results are promising, they 
must be validated in larger studies before 
it will be clear whether ctDNA truly offers 
an accurate view — and, more importantly, 
whether it can save or improve lives. “Just 
monitoring your tumour isn’t good enough,’ 
says Luis Diaz, an oncologist at Johns Hop- 
kins University in Baltimore, Maryland. “The 
challenge that we face is finding true utility.” 
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If researchers can clear those hurdles, liquid 
biopsies could help clinicians to make better 
choices for treatment and to adjust those 
decisions as conditions change, says Victor 
Velculescu, a genetic oncologist at Johns Hop- 
kins. Moreover, the work might provide new 
therapeutic targets. “It will help bring person- 
alized medicine to reality,’ says Velculescu. 
“Tt’s a game-changer.” 


DELAYED ACTION 

Scientists first reported finding DNA circu- 
lating in human blood in 1948 (ref. 2), and 
specifically in the blood of people with can- 
cer in 1977 (ref. 3). It took another 17 years to 
show that this DNA bore mutations that are 
hallmarks of cancer — proof that it originated 
from the tumours*”. 

The first practical use of circulating DNA 
came in another field. Dennis Lo, a chemical 
pathologist now at the Chinese University of 
Hong Kong, reasoned that if tumours could 
flood the blood with DNA, surely fetuses could, 
too. In 1997, he successfully showed that preg- 
nant women carrying male babies had fetal 
Y chromosomes in their blood’. That discovery 
allowed doctors to check a baby’s sex early in 
gestation without disturbing the fetus, and ulti- 
mately to screen for developmental disorders 
such as Down's syndrome without resorting to 
invasive testing. It has revolutionized the field of 
prenatal diagnostics (see Nature 507, 19; 2014). 

“Cancer has been slower to catch on,” says 
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Nitzan Rosenfeld, a genomicist at the Cancer 
Research UK Cambridge Institute. This is 
partly because tumour DNA is much harder to 
detect than fetal DNA. There is typically less of 
it in the blood, and the amounts are extremely 
variable. In people with very advanced cancers, 
tumours might be the source of most of the 
circulating DNA in the blood, but more com- 
monly, ctDNA makes up barely 1% of the total 
and possibly as little as 0.01%. Early sequencing 
technologies were not up to the task of detect- 
ing it — at least, not consistently or reliably 
enough to use ctDNA as a biomarker. 

But the past decade has brought sensitive 
techniques that can detect and quantify minute 
amounts of DNA. For example, an amplifica- 
tion method known as BEA Ming — which fas- 
tens circulating DNA to magnetic beads that 
can then be isolated and counted — can detect 
ctDNA even if it is outnumbered by healthy 
cell DNA by a factor of 10,000 to 1. 

Genetic oncologists Bert Vogelstein and 
Kenneth Kinzler at Johns Hopkins developed 
the technique, and in 2007 they described’ 
using it to track ctDNA in 18 people who were 
being treated for bowel cancer. After surgery, 
the patients’ ctDNA levels fell by 99%, but in 
many cases the signal did not disappear com- 
pletely. In all but one of the people with detect- 
able ctDNA at the first follow-up appointment, 
the tumours eventually returned. None of the 
people with undetectable levels after surgery 
experienced a recurrence. 


These results suggested that ctDNA can 
reveal how well a patient has responded to sur- 
gery and whether they need chemotherapy to 
finish off any lingering cancer cells. Research- 
ers soon found similar results for other types 
of cancer. Rosenfeld and his Cancer Research 
UK colleagues James Brenton and Carlos Caldas 
showed that ctDNA provides a precise portrait 
of advanced ovarian and breast cancers®. Andin 
the largest study yet, Diaz and other members 
of the Johns Hopkins group detected ctDNA in 
at least 75% of patients with advanced tumours, 
in organs as diverse as the pancreas, bladder, 
skin, stomach, oesophagus, liver and head and 
neck’. (Brain cancers were a notable exception, 
because the blood-brain barrier stops tumour 
DNA from reaching the bloodstream.) 


BETTER BIOMARKERS 

Circulating DNA might perform better than 
the protein biomarkers that researchers have 
been seeking and refining for decades. Proteins 
are used in the clinic to diagnose illnesses and 
monitor people undergoing treatment. For 
example, prostate-specific antigen is a bio- 
marker for prostate cancer, but it can give false 
positives because there are other reasons that 
the antigen can be elevated in the blood. False 
positives should be rarer with ctDNA because 
it is defined by mutations and other genomic 
changes that are hallmarks of cancer cells. And 
although most protein biomarkers stay in the 
blood for weeks, ctDNA has a half-life of less 


© 2014 Macmillan Publishers Limited. All rights reserved 


than two hours, so it gives a clearer view of 
a tumoutr’s present, rather than its past. The 
Cambridge and Johns Hopkins teams have 
found that ctDNA is more sensitive than pro- 
tein biomarkers when it comes to detecting 
breast’ and bowel’ cancers, respectively, and 
it is more accurate at tracking tumour dis- 
appearance, spread and recurrence. 

Both teams also showed that ctDNA was 
more sensitive than circulating tumour cells 
— intact cancer cells that also travel around 
the bloodstream and have been an intense area 
of research. Ina sub-study of 16 people, Diaz’s 
team found that where both were present, 
ctDNA fragments outnumbered circulating 
tumour cells by 50 to 1 (ref. 9). And although 
ctDNA was always there if the circulating cells 
were, 13 people with detectable tumour DNA 
had no trace of such cells. 

But most exciting to scientists, says Diaz, is 
the ability to watch tumours evolve and adapt 
over time: “It'll help us answer questions in 
oncology that have never been answered 
before” 

For example, why do so many targeted 
therapies eventually fail? Gefitinib and pani- 
tumumab are among several drugs that block 
the epidermal growth factor receptor (EGFR), 
a protein involved in cell growth and division 
that is overactive in a number of cancers. Peo- 
ple taking these drugs do very well — briefly. 
But after a few months, their cancers almost 
always develop resistance, often through 
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changes to other genes, such as KRAS, which 
is mutated in many cancers. 

To monitor patients and decide on the next 
course of action, clinicians would normally 
need to take multiple biopsies. But people with 
advanced cancer often have several tumours to 
test, and different parts of any single tumour 
could be resistant in different ways. Biopsies 
are invasive and risky, and difficult for inacces- 
sible and fragile organs such as the lungs. “You 
can't just go to the patient and get five more 
biopsies after the treatment fails,” says Vel- 
culescu. Taking blood is simple in comparison. 

In 2012, Diaz’s team reported" using ctDNA 
to study patients who were being treated with 
EGER inhibitors. The researchers found 42 dif- 
ferent KRAS mutations that confer resistance; 
on average, these turned up 5 months before 
imaging techniques showed that the tumours 
were progressing. The team was specifically 
looking for KRAS mutations, but Rosenfeld’s 
group has used ctDNA to identify resistance 
mutations from a blind start. Last year, the 
researchers described how they had sequenced 
the complete exomes — the 1% of the genome 
that encodes protein — in blood samples from 
six people being treated for advanced breast, 
lung or ovarian cancers. In five cases, the 
unguided search revealed routes to resistance, 
such as mutations that prevent drugs from 
binding to their target proteins”. 

Spotting resistance early would let clinicians 
take patients off toxic and expensive drugs that 
are unlikely to keep working. And by identify- 
ing the mutations that underlie the resistance, 
they could find effective alternatives or drug 
combinations. “The hope is that we can turn 
cancer from a deadly disease into a chronic 
one,’ says Velculescu. “You treat someone with 
one therapy and when it stops working, you 
switch, or alternate back and forth” 


CLINICAL CAVEATS 

Despite its promise, ctDNA is not yet ready 
for a starring role in the clinic. For one thing, 
the most sensitive techniques for detecting it, 
such as BEAMing, rely on some knowledge of 
which mutations to look for. This knowledge 
can be provided by taking a biopsy, sequenc- 
ing its mutations, designing patient-specific 
molecular probes that target them, and using 
those probes to analyse later blood samples — 
a laborious approach that must be repeated for 
each patient. The alternative is to use exome 
sequencing, as Rosenfeld’s team did. This 
requires no previous knowledge about the can- 
cer, but it is prohibitively expensive to sequence 
and analyse every sample at the depth required 
to detect rare mutant fragments. 

Maximilian Diehn, a radiation oncologist 
at Stanford University in California, has tried 
to combine the best of both worlds. His team 
identified a small proportion of the genome 
— just 0.004% — that is repeatedly mutated in 
lung cancers’*. Whenever the researchers get a 
new blood sample, they sequence this fraction 
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“ITLL HELP US 
ANSWER QUESTIONS IN 
ONCOLOGY 


THAT HAVE NEVER 
BEEN ANSWERED 
BEFORE. * 


10,000 times over. This picks up even rare 
mutant fragments, and the focused approach 
keeps costs down. Because almost everyone 
with lung cancer has at least one mutation 
in these regions, the method should work in 
almost every patient, says Diehn. The team is 
now working to develop similar mutation pan- 
els for other types of cancer, and to validate the 
technique in clinical trials — work that could 
take several years. 

Like practically all ctDNA biopsy tech- 
niques, Diehn’s approach does not do well at 
picking up early forms of cancer. In a small 
study”’, it detected every lung cancer of stage II 
or higher, but only half of stage I tumours. This 
is understandable — advanced cancers simply 
discharge more DNA — but it limits ctDNA‘s 
potential as a cancer-screening tool. 

Diehn says that more-sensitive techniques 
could overcome this problem, but Diaz dis- 
agrees. “The limiting factor is biology,” he 
says. “There just aren’t a lot of fragments 
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in circulation.” And if ctDNA hints at the 
presence of an undetected cancer, what then? 
“If you detect a mutation in the circulation, you 
don't know where it’s coming from,” says Diaz. 

There are other unknowns, too. Does 
ctDNA paint a truly representative portrait of 
acancer? Do tumours that have spread to other 
organs release as much DNA as the original 
tumours? Do all the cells in a tumour release 
as much ctDNA as each other? Diaz says that 
the only way to answer these questions is to do 
‘warm autopsies’ — to take samples and char- 
acterize all of a person’s tumours very soon 
after death, and compare them with ctDNA 
extracted in life. “This is the heavy lifting that'll 
need to be done in the field,’ he says. 

And the biggest question remains: does 
an accurate picture of tumour burden, or a 
real-time look at emerging mutations, actu- 
ally save patients or improve their quality of 
life? Even if doctors discover that someone's 
tumour has developed a resistance mutation, 
that insight is useless if there are no drugs that 
target the mutation. “The limitation is the real- 
ity of targeted therapies,” says Velculescu. “You 
get all this information — but so what? Our 
approaches to understanding cancer are out- 
stripping our clinical options.’ 

Even if ctDNA does not yet affect outcomes, 
scientists say that it is an invaluable research 
tool, and clinicians are starting to collect it 
routinely. Swanton, for example, is leading a 
£14-million (US$24-million) lung-cancer study 
called TRACERx (Tracking Cancer Evolution 
Through Therapy), which will use both conven- 
tional biopsies and ctDNA collected once every 
three months. The circulating DNA may or may 
not provide clues that help the study partici- 
pants, but at the very least, it will give Swanton 
a much better understanding of how lung can- 
cer evolves, and how to control that evolution. 

As Rosenfeld argues, it is better to have this 
information than not to. Currently, he says, 
“we're groping in the dark. Why would you do 
that if you have a tool that allows you to see 
what’s happening?” m 


Ed Yong is a science journalist based in 
London. 
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A girl drinks water from a tap in Anhui province, China. 


A sustainable plan for 
China’s drinking water 


Tackling pollution and using different grades of water for different tasks is 
more efficient than making all water potable, say Tao Tao and Kunlun Xin. 


aking drinking water safe is a 
Mee in China. Serious health 
and social problems concentrate 
in areas where the water quality is poor. 
Every year, 190 million people in China 
fall ill and 60,000 people die from diseases 
caused by water pollution such as liver and 
gastric cancers’. Around 300 million peo- 
ple face shortages of drinking water’. In a 
2009 nationwide assessment, one-quarter of 
4,000 urban water-treatment plants surveyed 
did not comply with quality controls, stoking 
public fears about the health impacts. 
The Chinese government is in the 


middle of a five-year 410-billion-renminbi 
(US$66-billion) programme to deliver safe 
drinking water to all town and city resi- 
dents — about 54% of the population — by 
2015. The focus is on upgrading 92,300 kilo- 
metres of mains pipes and thousands of 
water-treatment plants to developed-world 
standards. 

But this infrastructure-focused approach 
is ill-suited to China, which is projected 
to remain a developing country until at 
least 2050. Urban expansion will outpace 
improvements to public water systems, and 
treating polluted water will require large 
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amounts of energy, expensive technologies 
and chemicals. 

Instead, the government should focus on 
cleaning water sources and recycling water. 
The first priority must be to purge rivers 
and lakes of industrial and agricultural pol- 
lutants, and to prevent these from entering 
the water table in the first place. Cheaper 
technologies at the point of use, such as puri- 
fiers on taps, would be enough to deliver clean 
drinking water to most of China's population, 
because drinking water accounts for only a 
few percent of total consumption (see ‘China's 
water’). Water of lower quality can suffice > 
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> for laundry, bathing and kitchen use. 

In 2012, China mandated that tap water 
in all cities should meet a standard based on 
106 indices called for by the World Health 
Organization. As well as investing in water 
treatment, distribution and quality moni- 
toring, the government has added drinking- 
water safety to its list of 13 major technology 
projects, which include lunar-exploration 
and crewed space programmes. Billions have 
been spent on researching drinking-water 
problems in key river basins and lakes. But so 
far, only a few cities meet the desired standard. 


SHORT SUPPLY 

It is a problem of supply and demand. 
Supply is a challenge because almost half 
of China's water sources are polluted. Wells 
and aquifers are contaminated with fertiliz- 
ers and pesticide residues and heavy met- 
als such as arsenic and manganese from 
mining, the petrochemical industry and 
domestic and industrial waste. More than 
three-quarters (76.8%) of 800 wells moni- 
tored in nine provinces, plus autonomous 
regions and municipalities, including Bei- 
jing, Shanghai and Guangzhou, failed to 
meet standards for groundwater in a 2011 
national evaluation’. 

Water demand is a challenge because of 
runaway economic growth and urbaniza- 
tion. China is short of 40 billion tonnes of 
water a year on average. In 2011, 665 cities 
consumed in excess of 44 billion tonnes of 
water, or an average of around 66 million 
tonnes each. By 2020, when China's urban 
population proportion is projected to reach 
60%, cities might need 58 billion tonnes of 
water. 

But take a closer look at how that water 
is used, and the problem becomes tractable. 
Almost two-thirds of municipal water is 
used by industry, agriculture and construc- 
tion. Households consume the remaining 
third (365 million people used 15.3 bil- 
lion tonnes of water in 2011). Of that, 


laundry, bathing and washing up take up 
most (together more than 80%). Cooking 
and drinking use just over 2% (1.1 billion 
tonnes). In other words, most household 
water need not be fit to drink. 

Bringing a large developing country such 
as China up to the same standard as a devel- 
oped country will require more-intensive 
water treatment. This has environmen- 
tal consequences. In Jiangsu province, 
for example, carbon dioxide emissions 
increased by 28% in 


2012 when atypeof “Chinais 
water filtration called short of 
ozone-biological acti- 40 billion 
vated carbon treat- fonnes of 
ment was extended watera year 


to one-quarter of 
the province's supply 
(5.3 million tonnes per day). China is in need 
of cheap, energy-efficient methods of water 
purification that minimize chemical use’. 

Even if tap water becomes drinkable, few 
people will stop boiling drinking water, a 
habit that is ubiquitous in China. Boiling 
kills or deactivates all waterborne patho- 
gens, including protozoan cysts such as 
cryptosporidium that can be resistant to 
chemical disinfection, and viruses such 
as rotavirus and norovirus that are too 
small to filter out’. Even if the water is tur- 
bid, boiling can remove microorganisms 
and volatile organic compounds such as 
benzene and chloroform. 

In this context, purification systems that 
improve drinking water at the point of use 
are a good fit. In Kenya, Bolivia and Zambia, 
water purifiers have been shown” to reduce 
diarrhoeal disease by 30-40%. Fewer than 
5% of Chinese homes currently have these, 
despite a unit costing only around 1,500 to 
2,000 renminbi. 

China’s water-purification industry is 
growing by about 40% a year — fewer peo- 
ple are buying water dispensers and barrelled 
water. But water-purification devices are 


on average.” 


CHINA'S WATER Other 
Nearly half of 634 Chinese rivers, lakes and 2% | Laundry 
reservoirs tested in 2011 failed to meet drinking | 39% 
standards for all or part of the year (left). 
Purifiers on taps could provide the small 
proportion of water for drinking (right). ah 
a ra) ; 
Sometimes Never potable ae g ean 
potable L Ss : 
£ 
URBAN 
19% 
QUALITY Flushing, cleaning 
oy Xe 13% 
—Drinkable "Weis nc Drinking, 
year-round triat* 38 p cooking water 
55% 2% 


*Includes service-industry, construction, agricultural and ecological uses. 
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unregulated. Incomplete after-sales service 
leads to improper maintenance; delays in 
changing filter cartridges can introduce 
microorganisms. Filters and units made 
from toxic materials such as non-food-grade 
plastic are ineffective. 

Treated grey water (waste water from 
showers and baths) and black water (from 
toilets) are increasingly used in China for 
industrial and irrigation purposes, and for 
flushing in new residences. But this type 
of recycling is impractical for most exist- 
ing households, owing to the high cost 
and disruption of installing the necessary 
plumbing. 


SOME FOR ALL 

What next? By using cheap, low-carbon 
water purifiers in all homes, China can 
avoid the technology ‘lock-in that leads 
developed countries to waste potable water, 
and leap-frog to a sustainable supply sys- 
tem. In the long term, the improvement of 
water sources will ensure that most people 
have safe drinking water. 

The local and national governance struc- 
tures overseeing water supply and water- 
source pollution must be merged and an 
agency created to manage them — currently 
these are administered separately, by the Min- 
istry of Housing and Urban-Rural Develop- 
ment, Ministry of Water Resources and 
the Ministry of Environmental Protection. 
Responsibilities need to be clearly defined. 

Regulation and standards for water puri- 
fiers should be enforced. Business models for 
scaling up these technologies require inves- 
tigation and testing of factors including the 
costs of production and implementation, 
pricing, subsidy and microfinance’. 

As the Global Consultation on Safe Water 
and Sanitation for the 1990s° put it, China’s 
aim in supplying clean water should be: 
“Some for all rather than more for some”. m 


Tao Tao is professor and Kunlun Xin 
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DIETER TELEMANS/PANOS 


Masai women from Kenya take a course on solar energy in India. 


Energy studies 
need social science 


A broader pool of expertise is needed to understand 
how human behaviour affects energy demand and the 
uptake of technologies, says Benjamin K. Sovacool. 


o secure a safe, reliable and low- 
[ieson energy future, we must alter 

both technologies and human behav- 
iour’. The US Department of Energy notes* 
that supply and demand is “affected as much 
by individual choice, preference, and behav- 
ior, as by technical performance”. 

Yet many researchers and policy-makers 
continue to focus on only one side of the 
energy dilemma. In the United States, for 
every dollar in research funds spent on behav- 
ioural and demand-side energy research, $35 
is spent on energy supply and infrastructure’. 
Social sciences, humanities, and the arts are 
marginalized in energy research, and major 
statistical agencies do not usually collect 
qualitative data about energy consumption. 
Similar problems are apparent in Europe’. 

My analysis of the peer-reviewed energy- 
research literature shows how biases handi- 
cap the field’. Engineers and economists are 
ignoring people and miscasting decision- 
making and action. Academic researchers 


frequently obsess over technical fixes rather 
than ways to alter lifestyles and social norms’. 
Interdisciplinary research remains stymied by 
institutional barriers in academia and govern- 
ment’. National and local energy bodies have 
conventionally had few social scientists on 
staff*. And most leading journals in the field 
focus on one discipline. 

Now the energy field needs to learn from 
health, agriculture and business, and bring 
together social and physical scientists. Uni- 
versities should develop courses focused on 
solving energy problems, granting agencies 
should prioritize and direct more money to 
behavioural work, and energy journals should 
broaden their scope. Already, there are prom- 
ising examples of how inclusive and inter- 
disciplinary energy research can encourage 
energy efficiency, and so address global envi- 
ronmental challenges such as climate change’. 

I examined the authorship and scope of 
4,444 full-length articles over 15 years (1999 
to 2013) in three leading energy technology 
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and policy journals: Energy Policy and The 
Energy Journal have high impact factors, and 
The Electricity Journal was included to sample 
a regulatory journal. I found four worrisome 
trends: an undervaluation of the influence 
of social dimensions on energy use; a bias 
towards science, engineering and economics 
over other social sciences and the humanities; 
a lack of interdisciplinary collaboration; and 
the under-representation of female authors or 
those from minority groups. 

For instance, technology adoption, the 
complexity of choice-making, and the 
human dimensions of energy use and 
environmental change were rarely covered 
(see ‘Neglected topics’). Most articles (85%) 
focused on advanced energy-production 
systems, such as nuclear reactors, sources 
of renewable electricity and biofuels, or the 
technical elements of electricity generation, 
transmission and distribution — hardware 
— rather than the human ‘software’ behind 
it. Simple devices such as cooking stoves, 
bicycles, light bulbs and distributed gen- 
eration were studied in less than 3.5% of 
articles. Behaviour and energy demand was 
investigated in less than 2.2% of papers. If 
this work is being published, it is in environ- 
mental sociology, psychology and political- 
science journals that few energy researchers 
read. 


SOCIAL OUTCASTS 

Social-science authorship and citations are 
also relatively low (see ‘Publishing trends’). 
Science, engineering, economics and sta- 
tistics account for more than half (67%) 
of institutional affiliations as reported by 
authors; non-economic social science for 
less than 20%. Sociology, geography, his- 
tory, psychology, communication studies 
and philosophy each constituted less than 
0.3% of author affiliations. 

References to social-science and humani- 
ties journals, with their insights into how con- 
sumers and politicians behave, were less than 
4.3% of 90,097 citations across the sample. 
Little research took place in the ‘real world. 
Most studies are the result of work under- 
taken at the bench or desk using computer 
models and experiments, rather than field 
research, interviews and surveys. 

Another trend is that the scientists and 
engineers writing in these journals rarely 
collaborate beyond their fields. About half of 
published authors in the sample wrote alone 
and one-quarter published with colleagues 
within their discipline. Less than 23% of arti- 
cles involved interdisciplinary collaborations 
between authors. 

Furthermore, the vast majority of authors 
hale from affluent Western institutions and 
countries where research money is abun- 
dant. They focus on problems facing the 
industrialized world. Of the 9,549 authors 
who listed their country of residence, 87% 
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came from either North America or western 
Europe. African, Asian, Latin American and 
Middle Eastern authors were few. Authors 
were mostly male: only 15.7% could be iden- 
tified as female. Norms of authorship and 
collaboration vary, but these trends held 
for each year examined: female authorship 
remained below 17.4% and non-Western 
authorship under 16%, for example. 


FIVE RECOMMENDATIONS 

To bring in social scientists and other 
marginalized researchers, I have five recom- 
mendations. 

First, public and private organizations 
should overhaul the way they structure and 
disburse funding for energy research and 
development. They should give a bigger slice 
to social scientists, improve incentives for 
interdisciplinary work and prioritize social 
topics in their funding calls — suchas the per- 
ceptions of energy users, the needs of people 
affected by energy production and prevailing 
customs, traditions and behaviours. 

Second, to reduce disciplinary bias, energy 
ministries, statistical agencies and public 
utility commissions should focus more on 
energy behaviour and demand, rather than 
just supply. Delaware and the District of 
Columbia, for instance, have sustainable- 
energy utilities, which advise residents about 
behavioural changes they can make to save 
energy and money. The statewide energy- 
efficiency utility, Efficiency Vermont, pro- 
vides funding and behavioural guidance to 
homes, farms and factories. 

Third, administrators should make 
energy research more problem-oriented, 
including social perspectives as a matter of 
course. Universities should develop topi- 
cal programmes on energy, as they have in 
agricultural research, medicine and busi- 
ness. Curricula might include efficient and 
sustainable consumption, risk management, 
public decision-making and the design 
of technologies for public acceptance and 


NEGLECTED TOPICS 


Twelve subjects seldom considered in energy studies. 


Topic Example 
Gender and identity 

Philosophy and ethics 
Communication and persuasion 
Geography and scale 

Social psychology and behaviour 
Anthropology and culture 
Research and innovation 

Politics and political economy 
Institutions and energy governance 
Energy and development 
Externalities and pollution 


Sociology of technology 
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PUBLISHING TRENDS 


Social-science studies were rarely published in three leading energy journals from 1999 to 2013. The 
emphasis on technology rather than human behaviour in energy research is reflected in the disciplinary 
backgrounds of authors, work referenced, and methods used. 


Economics and statistics 
20.3% 5 


AUTHOR’S DISCIPLINE 


Architecture and buildings | Other 
3.1% 7 8.3% 4 


Science, engineering and energy - 
46.7% 


Arts and humanities 


CITED SOURCES 


Social sciences _ Life sciences J 


19.6% 2.0% 
Economics Self-citations 
0.1% | 9.9% 7 7.3%) 


Non-classified and grey literature 
60.8% 


ARTICLE METHODS 


Books Social science | 
7.6% 4.2% 


Science 
10.1% 


Quantitative 
58% 


use. Good examples include the Univer- 
sity of Edinburgh, UK, which offers an 
interdisciplinary master’s degree in climate 
accounting; Aarhus University in Denmark 
has a business-development degree that 
combines engineering, innovation studies, 
energy studies, business and marketing; and 
Carnegie Mellon University in Pittsburgh, 
Pennsylvania, has an engineering and public 
policy department. Outside academia, the 
US Defense Advanced Research Projects 
Agency has successfully used a ‘challenges- 
centred’ approach to national-security 
problems since it was created in 1958. 

Fourth, researchers should do more to 
accommodate expertise and data from lay- 
persons, indigenous groups, community 
leaders and other non-conventional par- 
ticipants. Although this may require special 
training to do effectively, such interactions 
would encourage greater feedback and inte- 
grate diverse viewpoints. 

Fifth, journal editors can prioritize 
interdisciplinary, inclusive, comparative, 


Pollution from cooking stoves posing greater risk to women than men 
Future generations bearing the burden of pollution 

Energy information changing individual or firm behaviour 
Mismatching the size of energy systems to patterns of demand 
Shaping energy choices by trust, control and denial 

Temporal and regional differences in conceptions of energy services 
How people, markets and institutions drive innovation 

Resources contributing to conflict or stymying growth 

Evolving rules and norms to address collective energy problems 
Energy use contributing to economic growth and falling poverty 
Costs to society of erosions of environmental and ecological capital 


Economic, political and social drivers of energy consumption 
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Qualitative _ Not applicable 
13% 29% 


mixed-methods research. A new journal 
published by Elsevier, Energy Research & 
Social Science (of which Iam editor-in-chief), 
calls explicitly in its aims and scope for papers 
that blend disciplinary concepts, go beyond 
single case studies, and utilize an assortment 
of methods. Wiley Interdisciplinary Reviews: 
Energy and Environment also seeks cross- 
disciplinary assessments of energy systems. 

Energy studies must become more socially 
oriented, interdisciplinary and heterogene- 
ous. Problem-focused research activities that 
centre on both physical and social processes, 
include diverse actors and mix qualitative 
and quantitative methods, have a better 
chance of achieving analytic excellence and 
social impact. = 
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| COMMENT | BOOKS & ARTS 


It is not necessary to mimic another’s actions to understand them. 


Looking-glass wars 


Patricia Smith Churchland welcomes a critique of the 
mirror-neuron theory linking brain and behaviour. 


ell-grounded theories that 
connect neurons with behaviour 
are highly prized but in short 


supply. One behaviour we would dearly love 
to explain is how humans, along with some 
birds and mammals, ‘mind-read’ — that is, 
attribute mental states such as goals, inten- 
tions and feelings to others, to predict and 
understand their actions. Because others’ 
motivations are not directly observable, the 
capacity to intuit them has seemed to require 
some special explanation. 

The discovery several decades ago of 
‘mirror neurons in the premotor cortex of 
macaque monkeys spawned the idea that 
these neurons provide that special expla- 
nation. As described by neuroscientists 
Giacomo Rizzolatti and Laila Craighero, 
mirror neurons respond both when one 
monkey sees another make a certain move- 
ment and when the animals make the same 
movements themselves. 

The idea that a brain’s capacity to mind- 
read emerges more or less automatically from 
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the activity of mirror neurons was articulated 
thus by neuroscientist Marco Iacoboni in 
Mirroring People (Farrar, Straus and Giroux, 
2008): “Mirror neurons undoubtedly provide, 
for the first time in history, a plausible neuro- 
physiological explanation for complex forms 
of social cognition and interaction.” Examples 
of such social cognition might be my know- 
ing what you intend 7 
when I see you head — 
for the chicken coop 


with an axe in hand, ER AS of Ah 
or my understanding Mitroy. } 
whata baseball player | Neuron 5 
feels when he strikes ct 
outinthelastinning. | “"%ory Micka, | 
In The Myth of Mirror — 
Neurons, cognitive sci- 
entist Gregory Hickok The Myth of Mirror 
undertakesabalanced Neurons: The Real 
and detailed examina- Neuroscience of 

‘ : Communication 
tion of claims that ang Cognition 
have flourishedin the — crecory HICKOK 


past ten years — that —W.W. Norton: 2014. 
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mirror neurons are the key to explaining our 
capacity for reading other minds. 

The mirror-neuron approach to mind- 
reading depends on the assumption that our 
own inner lives are transparently revealed to 
our own minds. So when we see someone 
holding an axe while heading to the hen 
house, that observation activates in us not 
only the motor programme for that move- 
ment, but also the mental state that is its 
normal antecedent — intent to slaughter a 
chicken. Simulation of observed behaviour, 
so this argument goes, allows us to identify 
others’ intentions. Iacoboni declares that 
mirror neurons “are at the heart of how we 
navigate through our lives”. 

These are bold and promising ideas, and 
Hickok wants to know whether the research 
makes good on the promise. One basic prob- 
lem that he sees is this: the evidence shows 
that mirror neurons respond to movements, 
one’s own and others. The claim is that mir- 
ror neurons reflect high-level understand- 
ing of goals. But how? Rizzolatti and his 
colleagues tried to address this matter by 
designing an experiment in which a mon- 
key is trained either to put food in its mouth 
or to put an object in a cup stationed near its 
mouth: similar movements, different goals. 

When other monkeys witness these 
actions, the responses of mirror neurons in 
their inferior parietal lobes vary depending 
on whether the action-performer grasps to 
eat or to place. So are these neurons sensi- 
tive to observing similar movements with 
different goals? Here things get complicated 
because the non-food object was taken from 
a jar, but the food was not. In Hickok’s view, 
this leaves open the question of whether 
the mirror neurons are reading the goal or 
merely responding to different movements. 

As Hickok sizes it up, the responses of the 
witness monkeys’ mirror neurons seem to 
be explicable in terms of past associations, 
implying that the claim of mind-reading 
through simulation is superfluous. Likewise 
with my expectations regarding the unfortu- 
nate hen: my brain does not need to produce 
a simulation of your behaviour to know your 
motive, because in the past I have detected 
axe-wielding in the vicinity of chickens before 
they are slaughtered. Adding to the scepti- 
cism, Hickok points out that you can under- 
stand many actions that you never perform. 
Hickok’s dog, who never throws the ball him- 
self and thus cannot simulate ball-throwing, 
nevertheless reliably predicts the ball’s trajec- 
tory by watching Hickok’s arm. As for simu- 
lating feelings, I may know that a baseball 
player is disappointed after striking out, but 
feel only joy ifit means my team is winning. 

Some scientists sought to draw support for 
the simulation hypothesis from the motor 
theory of speech perception (MTSP). In 
brief, the idea of MTSP, popular in the 
1950s, is that I can understand what you 
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mean when you say, “The cat is swimming” 
by recreating that bit of speech in my brain’s 
speech area. Language is Hickok’s area of 
expertise, and he reminds us of the experi- 
ments that saw MTSP shelved. For example, 
people with a disorder called Broca’s aphasia 
are unable to produce speech, but they can 
still understand it, as can children in the pre- 
speech language-learning phase of develop- 
ment and people born with cerebral palsy 
who have severely impaired speech produc- 
tion. Some researchers dismiss those flaws 
in MTSP on the grounds that the mirror- 
neuron story explains language understand- 
ing. The circularity here is not reassuring. 

Not least of the problems with the mirror- 
neuron approach is that learning mind- 
reading skills cannot be just a matter of 
simulation, because such skills depend ona 
co-evolution of understanding of the selfand 
of others. Recognition of one’s own inner 
states is not a computational freebie. 

How fares the hypothesis that autism is 
fundamentally a mirror-neuron disorder? So 
far, it is mixed. A deeper perspective derives 
from post-mortem studies of the brains of 
youngsters with autism. These show patches 
of laminar disorganization — types of neuron 
in the wrong layer making the wrong connec- 
tions — in wide swathes of the prefrontal cor- 
tex, including areas important for executive 
function, motor control and social cognition, 
as well as areas that probably contain some 
mirror neurons. This suggests that autism 
is not primarily or essentially a disorder of 
a hypothetical mirror-neuron system, but a 
broader disorder that affects many aspects of 
normal brain function, including cognition. 

Hickok does not for a moment deny that 
we mind-read. Rather, his point is that the 
roles of mirror neurons and simulation have 
been oversold. The upshot of his inquiry is 
an analogue of the familiar warning: if it 
seems too good to be true, it probably is. 

Hickok’s critique deserves to be widely 
discussed, especially because many scientists 
have bought into the mirror-neuron theory 
of action understanding, perhaps because 
they lack the time or inclination to peer into 
its workings themselves. Hickok performs a 
valuable service by laying out the pros and 
cons clearly and fairly. He ends by agree- 
ing that although mirror neurons may well 
have a role in explaining communication 
and empathy, many other neural networks 
with complex responses are undoubtedly 
involved. Those networks and their roles 
are still to be clarified. m 


Patricia Smith Churchland is professor 
emerita of philosophy at the University 
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Books in brief 


Invisible: The Dangerous Allure of the Unseen 

Philip Ball BODLEY HEAD (2014) 

Young children, notes science writer Philip Ball, believe they vanish 

when they shut their eyes. Such beliefs wither, but “the dream and the 

desire” for invisibility remain, and Ball traces these through history. 

The urge has spawned occultism, stage magic, a fascination with 

camouflage, and legends centring on rings and cloaks. It re-emerged 

| a century ago in the confluence of paranormal beliefs and the new 
physics — and, today, in optical physicists’ invisibility shields. Ball 

7 argues that this “mythical lens” we train on reality inspires scientific 
discovery, but we need to understand its calibration. 


H is for Hawk 

Helen Macdonald JONATHAN CAPE (2014) 

This extraordinary book is ostensibly about falconry. It actually tells 
how a human wild with grief came to fathom a wild mind — a process 
in which the question of who was being tamed was always up in 

the air. Writer Helen Macdonald, devastated by her father’s death, 
took on a goshawk. Her narrative interweaves exquisitely rendered 
observations — of hawk behaviour, her immersion in the bird’s world 
and what happens between them — with the life and work of author 
T. H. White, whose 1951 The Goshawk inspired her as a child. Soars 
beyond genres, and burns with emotional and intellectual intensity. 


Shocked: Adventures in Bringing Back the Recently Dead 

David Casarett CURRENT (2014) 

In 1986, a toddler named Michelle Funk drowned and lay dead for 
three hours before a medical team coaxed her back to life. Decades 
later, relates physician David Casarett, the science of resuscitation is 
very much alive. In this disarmingly amusing investigation, Casarett 
covers breakthroughs, devices, hazards and case studies. He visits 
resuscitative techniques of the past, such as blowing tobacco smoke 
into the victim’s rectum; the cellular effects of methods using 
electricity and low temperature; and potential future advances, 
including reducing metabolism. 


Riveted: The Science of Why Jokes Make Us Laugh, Movies Make 
Us Cry, and Religion Makes Us Feel One with the Universe 

Jim Davies PALGRAVE MACMILLAN (2014) 

Moments that jolt or delight us punctuate our lives. But whereas 
shock might be salutary in an art gallery, it can trigger blind belief in 
other contexts, points out cognitive scientist Jim Davies. Expounding 
his theory of ‘compellingness foundations’, Davies synthesizes 
research on what makes us susceptible to gripping stimuli, such 

as our drives to discover patterns and to find incongruity, and our 
attraction to hope and fear. Scepticism, he argues, can help us to 
build resistance to riveting ideas that turn out to be duds. 


Malthus: The Life and Legacies of an Untimely Prophet 

Robert J. Mayhew BELKNAP (2014) 

Loathed by Karl Marx and admired by Charles Darwin, Enlightenment 
scholar Thomas Malthus still polarizes, notes historian Robert 
Mayhew. The flashpoint was Malthus’s 1798 An Essay on the Principle 
of Population, which posits that although humans are prodigal, 

nature and resources are limited. Mayhew traces that theory through 
revolutionary and reactionary traditions, arguing that it remains 
pertinent in an era of economic downturn and shrinking resources, 
with predictions of 10 billion humans by 2050. Barbara Kiser 
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Carbon cost will not 
stop oil-sands work 


Wendy Palen and colleagues 
propose a moratorium on 

new oil-sands projects until 
regulations are in place to 
ensure compliance with 
carbon-emissions commitments 
(Nature 510, 465-467; 2014). 
We question whether such a ban 
is justified on the basis of the 
criteria they propose. 

Emissions from oil-sands 
production are generally 
less than 0.1 tonne of carbon 
dioxide equivalent per barrel, 
so production costs would 
increase by at most US$6.50 per 
barrel if social costs were 
accounted for as the authors 
suggest. The social cost of 
carbon is estimated at about 
$65 per tonne of CO, equivalent 
over the lifespan of an oil- 
sands project (see go.nature. 
com/9ztmyu). 

However, many projects 
would remain viable despite 
this production-cost increase, 
in part because it would be 
offset for developers by tax and 
royalty deductions. The energy 
company Suncor, for example, 
estimates that a similar carbon 
policy would decrease its return 
on investment on a new mine by 
just 0.4%. 

A moratorium is neither 
sufficient nor necessary for 
Canada to meet its greenhouse- 
gas commitments, or to achieve 
global carbon stabilization at 
450 parts per million (see also 
N.C. Swart and A. J. Weaver 
Nature Clim. Change 2, 
134-136; 2012). 

To meet its target, Canada 
would probably need to set 
a carbon price that exceeds 
the social cost of carbon 
estimates — say, more than 
$100 per tonne (see go.nature. 
com/dswyma) — and apply 
it nationally to all sources of 
emissions. Even then, oil-sands 
production could continue to 
grow. 

Andrew Leach, Branko 
Boskovic University of Alberta, 
Canada. 


aleach@ualberta.ca 
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Put brain project 
back on course 


As an ambitious initiative of 

the European Commission, the 
Human Brain Project (HBP) 
must unite basic neuroscience 
research with information and 
communication technology (see 
www.humanbrainproject.eu). 
However, many neuroscientists 
are concerned that it has failed to 
do so (see http://neurofuture.eu). 
We believe that it is not too late 
to put the HBP on course and 
restore confidence by swift and 
decisive action. 

In our opinion, to let the HBP 
plough ahead without taking into 
account widespread views within 
the European neuroscience 
community would be akin to 
giving the lead on climate studies 
to the critics of global warming. 
Cooperative and effective 
large-scale research cannot be 
decreed: it has to emerge from 
inclusive discussion and respect 
for scientific argument. 

This will require the European 
Commission to implement 
significant changes. For 
example, the HBP charter 
needs to be amended to make 
its governance much more 
democratic: the direction of 
a project on this scale must 
reflect a maximally consensual 
scientific process. We also 
suggest that a neuroscience 
council should be created to 
formulate a strategy for Europe 
that is inclusive and scientifically 
driven, and which would help 
to drive partnering grants and 
international collaboration. 

We stand committed to 
working with the European 
Commission and the HBP, but 
inclusivity and good governance 
are essential to meet the huge 
challenges of understanding the 
human brain. 

Zachary F. Mainen 
Champalimaud Neuroscience 
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Review risks before 
eradicating toads 


Jonathan Kolby and colleagues 
call for swift eradication of the 
invasive Asian common toad 
Duttaphrynus melanostictus 
from Madagascar (Nature 

509, 563; 2014). We caution 
against disproportionate 
countermeasures that are not 
founded on proper data and 
assessment. These could have 
detrimental effects on local 
ecosystems that are comparable 
to the threat posed by the toads 
themselves. 

Draining potential breeding 
ponds, for example (see Nature 
http://doi.org/ts3; 2014), could 
have an impact on local fauna 
or even on entire ecosystems. 
This approach would probably 
fail anyway because larvae of 
D. melanostictus can survive in 
streams, puddles and brackish 
waters. Also, efforts by amateur 
conservationists and locals to 
destroy toad spawn and larvae 
could jeopardize native frog 
species if people do not identify 
tadpoles or juveniles correctly 
(see, for instance, R. Somaweera 
et al. Biol. Conserv. 143, 
1477-1484; 2010). 

We consider the parallels 
drawn by Kolby and colleagues 
between D. melanostictus and 
the invasive cane toad (Rhinella 
marina) to be inappropriate. 
Invasion potential and the 
effects of alien species are hard 
to predict without sufficient 
data. To confirm a genuine 
biological invasion, information 
first needs to be collected on the 
toads’ range extension and the 
impact on local flora and fauna. 

Before implementing 
countermeasures, any negative 
effects should be evaluated. 
This calls for rapid assessment 
of the practical difficulties, risks 
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and prospects of success. 

Sven Mecke* Philipps- Universitit 
Marburg, Germany. 
meckes@staff.uni-marburg.de 
*On behalf of 12 correspondents 
(see go.nature.com/wj2aju for 


fulllist). 


University managers 
misled by metrics 


University administrators 
wishing to arrive at rapid 
decisions in evaluating staff 
performance may ignore metrics 
that seem too sophisticated 
(Nature 510, 444; 2014, and see 
J. Adams Nature 510, 470-471; 
2014). At King’s College London, 
for example, appallingly blunt 
metrics are being wielded to 
determine who should be made 
redundant. 

Some faculty members there 
are being appraised on grant 
income or hours of contact 
teaching, but not both, and 
without regard to indicators such 
as publication record, teaching 
quality or editorial-board 
membership (see go.nature.com/ 
bhbyjs). These metrics are having 
a disproportionate effect on staff 
with both research and teaching 
commitments — ironically, the 
university's stated ideal. 

Total grant income is in 
any case a questionable proxy 
for research quality, and 
cannot be used to compare the 
performance of researchers 
who have different outgoings 
and funding sources. Examples 
include basic and medical 
researchers, or those who work 
on model organisms that vary 
markedly in expense. Their grant 
sizes are unrelated to the quality 
of their research. 

Such misleading measures 
cannot inform the shrewd 
decision-making that is 
essential for tightly funded 
higher-education management. 
Thomas Butts King’s College 
London, UK. 
thomas. butts@kcl.ac.uk 
The author declares competing 
financial interests: see go.nature. 
com/u9y5ti for details. 


NEWS & VIEWS 


PALAEOCLIMATE SCIENCE 


Causes and effects of Antarctic ice 


Some 34 million years ago, there was a rapid growth of ice on Antarctica. A modelling study indicates that the ultimate 
cause of this glaciation was a decrease in the concentration of atmospheric carbon dioxide. SEE LETTER P.574 


DAN LUNT 


n page 574 of this issue, Goldner 

et al.’ tackle a long-standing debate in 

palaeoclimate science: the causes and 
effects of the largest climate transition of the 
past 50 million years, which occurred about 
34 million years ago. It was characterized by 
rapid cooling and growth of Antarctic ice, 
marking a change from the warm ‘greenhouse’ 
climates of the Eocene epoch to the ‘icehouse’ 
of the Oligocene epoch. 

Using a numerical climate model, the 
authors investigated the effects of inserting 
a continental ice sheet onto Antarctica, and 
found that spatial patterns of ocean cooling 
predicted by the model agree well with cooling 
patterns inferred from the geological record 
of this time period. This cooling, and associ- 
ated ocean-circulation changes, had previ- 
ously been attributed to geographical changes 
in ocean straits and seaways, but Goldner and 
colleagues conclude that the cooling is bet- 
ter explained as a response to Antarctic ice 
growth, itself caused by a decrease in the con- 
centration of atmospheric carbon dioxide. 

Before the cooling at the Eocene-Oligocene 
transition (EOT), much of Antarctica was veg- 
etated and was home to flora and fauna that 
today is found nearer the Equator, including 
flowering plants, beech forests and marsu- 
pials’. Two main hypotheses have emerged 
to explain the cooling and the growth of ice 
in this period, which occurred over about 


TRA 


Antarctica 


300,000 years” 4 The first proposal, called the 
gateway hypothesis, posits that gradual move- 
ments of continental plates over millions of 
years gradually widened the Drake Passage 
and Tasman Gateway in the Southern Ocean, 
allowing increased ocean flow around Ant- 
arctica. This led to decreased poleward heat 
transport, which resulted in cooling of the 
Antarctic continent and growth of the Ant- 
arctic ice sheet. The second proposal, called 
the CO, hypothesis, postulates that a decreased 
concentration of atmospheric greenhouse 
gases, in particular CO,, led directly to cool- 
ing of the Antarctic continent and growth of 
the ice sheet. 

One line of evidence previously used in 
favour of the gateway hypothesis concerns 
the spatial pattern of EOT ocean-temperature 
change. This pattern is derived by drilling deep 
into the modern ocean floor and extracting 
cores of ancient ocean sediments. The cores are 
analysed for their chemical and isotopic com- 
position, providing insights into changes in 
ocean temperature over time. By drilling cores 
at various locations and depths in the Atlantic, 
such analysis has revealed increasing cooling 
down to a depth of 2 kilometres, and increasing 
cooling from the Equator southwards. Until 
now, it had been suggested that this signature 
was best explained by the gateway hypothesis’, 
being similar to that predicted as a response to 
changes in ocean circulation associated with 
evolving ocean gateways°. 

However, Goldner et al. suggest that sucha 
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Atlantic Ocean 


Higher pressure 
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signature could be equally, or better, explained 
by the CO, hypothesis. When the researchers 
included an enlarged Antarctic ice sheet in 
their numerical climate model of the EOT, the 
predicted temperature change in the Atlan- 
tic was very similar to that inferred from the 
ocean sediment cores. This was not the case 
when they imposed modifications to the ocean 
gateways in line with the gateway hypoth- 
esis (in contrast to previous work’®, which the 
authors argue used modelling tools that are less 
advanced than their own). As such, Goldner 
et al. conclude that their work provides support 
for the CO, hypothesis. 

One of the strengths of this paper is that 
Goldner and colleagues have carefully ana- 
lysed their model results to understand the 
climatic mechanisms (Fig. 1) that give rise to 
the model-predicted patterns of ocean cooling 
and circulation. However, in my view, several 
interesting issues still remain. 

First, although the authors make inferences 
about the causes of Antarctic ice-sheet growth, 
they cannot tackle this explicitly because their 
modelling does not include a full representa- 
tion of the interactions between ice and cli- 
mate. It is possible that a change in gateways 
caused cooling that led to the growth of the 
Antarctic ice sheet, and it is these effects that 
are seen in the geological record. The causes 
of ice-sheet growth (or retreat) are best 
understood through the use of integrated cli- 
mate and ice-sheet models. Ice-sheet models 
have recently undergone a period of rapid 
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Figure 1 | Earth-system change at the Eocene-Oligocene transition. a,b, The diagram shows the Atlantic sector of the high latitudes of the Southern Hemisphere 
before (a) and after (b) growth of Antarctic continental ice at the Eocene-Oligocene transition 34 million years ago, as modelled by Goldner and colleagues’. At this 
transition, a decreasing concentration of carbon dioxide in the atmosphere leads to atmospheric cooling and growth of Antarctic continental ice and sea ice. The 
resulting ice sheet induces a north-south atmospheric pressure gradient near the surface, which drives increased easterly surface winds (moving towards the west; 
into the page as indicated by the blue arrow) around Antarctica. These winds change the ocean circulation, enhancing southwards ocean flow through a process 
known as Ekman transport. This dense (cold and relatively salty) water mass flows downwards as it reaches the Antarctic coast. Diagram not to scale. 
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development (see ref. 7, for example) , so the 
time is ripe to apply these to palaeoclimate 
events such as the EOT. 

Second, models are by definition approxi- 
mations of the real world, and it will be cru- 
cial for other groups to verify these findings 
with their own models. This is particularly 
important because other recent modelling 
work’ has indicated surface warming in the 
Atlantic sector in response to an increased 
Antarctic ice sheet — although that study 
focused on the more recent Middle Miocene 
climate transition, which occurred about 
14 million years ago. This is in contrast to 
the earlier cooling found by Goldner and 
colleagues. 

Third, the question also remains as to 
why greenhouse gases changed at this time, 
and by how much. Indirect estimates of CO, 


concentration indicate a concurrent drop’”’, 
although the uncertainties for these estimates 
are currently large. Possible causes include 
changes in the balance of sources (for exam- 
ple, decreased volcanism) and/or sinks (such as 
increased weathering of silicate rocks), and/or 
changes to reservoirs of carbon (for instance, an 
increase in the residence time of carbon in the 
ocean, owing to changes in ocean circulation). 
Picking apart these possible causes is a crucial 
challenge. 

Goldner and colleagues conclude their 
paper with a word of warning, noting that a 
complex web of positive and negative feed- 
backs means that the climate system can often 
behave unexpectedly. This can be interpreted 
as a strong note of caution regarding human- 
ity’s own current CO, ‘experiment’ with the 
climate system. m 


The mixed blessing 
of interferon 


Astudy in monkeys finds that treatment with the protein interferon protects 
against simian immunodeficiency virus, but that prolonged interferon 
administration exacerbates the chronic stage of the infection. SEE LETTER P.601 


AMALIO TELENTI 


oon after the identification of the AIDS 

epidemic in the early 1980s, many 

research groups reported on using 
interferon, a protein that triggers an anti- 
viral response, to treat HIV-infected patients. 
However, an early review of the literature! 
concluded that there was no evidence that 
interferon therapy exerted “any beneficial 
effect on the underlying immune defects” and 
that, in any case, many patients with advanced 
HIV-induced immunosuppression already 
exhibited elevated levels of interferon before 
its administration. But on page 601 of this 
issue, Sandler et al.’ report intriguing findings 
on interferons in HIV infection. They show 
that administering the interferon IFNa2a to 
rhesus macaques before exposure to simian 
immunodeficiency virus (SIV) prevented sys- 
temic viral infection, whereas treatment with 
an antagonist of the interferon receptor led to 
catastrophic infection. But they also find that 
prolonged exposure to interferon has a detri- 
mental effect. 

So what are the exact mediators of interferon 
that contribute to its beneficial effects in San- 
dler and colleagues’ study? Triggering the inter- 
feron response induces the expression of several 
hundred genes’. The roles of many, if not most, 
interferon-stimulated proteins are unclear and 
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Figure 1 | Interferon in acute and chronic 
infection. Sandler and colleagues’ findings” 
contribute to other evidence**””"” that there are 
strong parallels between the observed effects of 
experimentally administered interferon (IFN) 

on simian immunodeficiency virus (SIV) or HIV 
infections and the effects of endogenous IFN 
responses during natural SIV or HIV infections, at 
both the acute and chronic phases. 
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little is known about their specificity. Some 
selectively target HIV and/or SIV, and these are 
generally referred to as restriction factors’. Such 
factors have protected humans from transmis- 
sion of retroviruses from other species, but they 
failed to guard us against SIV from chimpan- 
zees, and HIV is the result of this transmis- 
sion. Unfortunately, HIV in humans and SIV 
strains in their natural hosts have developed 
mechanisms to evade these restriction factors. 
Thus, Sandler and colleagues’ observed suc- 
cess with interferon administration probably 
results from other aspects of the interferon 
response, or from the actions of undescribed 
interferon-stimulated proteins that retain suffi- 
cient general or specific antiviral activity. These 
are not very satisfactory explanations when one 
considers that the goal is to understand human 
responses to a therapeutic strategy. 

The finding that interferon treatment 
during SIV challenge increases host resistance 
to systemic infection reinforces previous data 
from studies in non-human primates, and 
places in perspective some existing data on 
interferon administration in humans (Fig. 1). 
In HIV-infected individuals® and in individu- 
als co-infected with hepatitis C virus’, use of 
interferon results in a modest decrease in HIV 
viral load in the blood. It was therefore a logical 
step for Sandler et al. to assess the effects of 
prolonged interferon administration during 
SIV infection. They observed that, despite 
the early beneficial effects, continued admini- 
stration resulted in increased susceptibility 
to infection and greater depletion of CD4* 
T cells (immune cells that are killed by SIV and 
HIV), and decreased expression of interferon- 
stimulated genes, compared to placebo. 

They describe this paradox as an interferon- 
desensitized state, and demonstrate that this 
is not the result of the animals developing 
neutralizing antibodies against the interferon. 
Interestingly, earlier studies had described a 
refractory state of cells to repeat interferon 
induction’. Furthermore, there is evidence*” 
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that a persistent abnormal interferon response 
— elevated expression of interferon-stimulated 
genes — is linked to increased viral load and 
disease severity during the chronic phase of 
HIV infection. Thus, both continuous admin- 
istration of interferon and persistent elevation 
of interferon-stimulated genes are associated 
with unfavourable outcomes in chronic HIV 
and SIV infection. 

HIV is not thought to become resistant 
to interferon in chronic infection — in fact, 
recent data suggest the contrary"’. Instead, 
there is increasing concern that when the host 
cannot clear an infection, continued interferon 
signalling leads to the induction of immuno- 
suppressive pathways with the aim of limit- 
ing damage associated with chronic infection 
(reviewed in ref. 11). This concept resonates 
with the idea of ‘tolerance, in which the host 
attempts to reduce the negative impact of an 
infection without directly affecting pathogen 
burden”. Tolerance is frequently observed in 
non-pathogenic SIV infections in natural hosts 
— animals may havea high viral load but there 
is minimal pathogenesis, and the interferon 
response normalizes after the initial stage of 
infection’. Unfortunately, in HIV infections 
the host rarely finds the sweet spot between 
viral load, disease and interferon: tolerance is 
rarely observed in humans” and progressive 
immunosuppression almost always ensues. 

Overall, there are strong parallels between 
the effects of deliberate interferon administra- 
tion in experimental SIV infections and the 
outcome of endogenous interferon responses 
in natural HIV and SIV infection (Fig. 1). 
Sandler and colleagues’ research highlights 
the importance of the timing and duration 
of interferon administration, but also under- 
scores the difficulty of understanding the 
exact potential of intervening in the interferon 
signalling pathways during infection. 

The interferon response, and more broadly, 
the innate immune response, remains a field 
of unknowns". Challenges include identify- 
ing the exact set of effector molecules, their 
roles at portals of entry of pathogens and their 
individual actions in acute and chronic disease. 
Thus, deconvoluting the interferon response is 
needed if the aim is to better use this pathway 
for therapeutic purposes. The complex anti- 
viral response to this protein can be summa- 
rized by the Cantonese expression ‘equipped 
with knives all over, yet none is sharp’ — a 
fitting metaphor for the failure of interferon to 
cut cleanly through HIV infections. m 
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The path most travelled 


Continuous tracking of the random trajectories of a superconducting quantum 
system as it evolves between two selected initial and final states has allowed 
researchers to determine the most probable path of the system. SEE LETTER P.570 


ADRIAN LUPASCU 


rajectories of various kinds, such as 

those of aeroplanes and migrating birds, 

are a familiar part of our everyday expe- 
rience. A trajectory is given, classically, by the 
position coordinates of an object as a function 
of time. In the realm of quantum mechanics, 
the state of an object is given not in terms of 
position, but rather by a more abstract math- 
ematical construct — a vector in the Hilbert 
space. Nevertheless, quantum states can be 


parameterized by a set of coordinates whose 
evolution in time defines quantum trajectories. 
For a quantum system in isolation, quantum 
trajectories are deterministic, bearing a decep- 
tive similarity to classical trajectories. However, 
the act of observation makes quantum trajec- 
tories random, revealing their fragile character. 
On page 570 of this issue, Weber et al.’ analyse 
these random trajectories and find that the 
most likely ones still provide insightful infor- 
mation about a quantum system. 

To study quantum trajectories, Weber and 


Time t, 


Figure 1 | Transmon trajectories. Weber et al.’ have measured the evolution of the quantum state of a 
transmon superconducting device. The quantum state is represented by the coordinate z, and its change 
over time defines a quantum trajectory. In the four sets of trajectories shown here (a-d), trajectories were 
selected that had initial and final coordinates, at times t, and t, around z, and z, respectively. The four panels 
correspond to four experiments that differ in the strength of the measurement (c and d correspond to stronger 
measurements) and in the total evolution time (b and d correspond to a longer evolution time). The trajectories 
are ‘blurred; reflecting the randomness of the measurements and the corresponding random change of 
state. However, the average trajectory in each case carries information on the dynamics of the system. 
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colleagues used a transmon, a micrometre- 
sized superconducting device that at low 
temperatures behaves as a quantum two-state 
system, with states dubbed 0 and 1. The trans- 
mon and other superconducting quantum 
devices are under intense investigation because 
of their potential applications in quantum com- 
puting’ and as test systems for investigating 
fundamental aspects of quantum mechanics’. 

The authors’ experiment builds on major 
advances in this field over the past few years. The 
first of these is the reduction in the decoherence 
of transmons. Any quantum system that 
interacts with its environment will ‘forget’ 
its state — that is, the system will decohere. In 
general, this effect hampers the observation 
of quantum behaviour* and would mask the 
quantum effects observed by Weber and col- 
leagues. The second advance is the development 
of ‘almost-perfect’ measurement methods for 
transmons. In quantum mechanics, even per- 
fect measurements are fundamentally limited: 
any single measurement provides only partial 
information on the state of an observed system. 
Moreover, after a measurement has been taken, 
the state of the system is changed irreversibly. 
The key feature of perfect quantum measure- 
ments is that the change of state following a 
measurement is the smallest possible allowed by 
the fundamental laws of quantum mechanics’. 

Weber et al. measured the state of the 
transmon by coupling it to a device known 
as a superconducting resonator. The natural 
frequency of oscillation of microwaves in this 
resonator depends on the transmon’s state. 
The authors sent microwaves to the resonator 
and continuously monitored how they were 
scattered. The scattered microwaves were pro- 
cessed further to extract a continuous signal 
containing information related to the state of 
the transmon. 

This detection scheme involves a continuous 
‘weak’ measurement of the transmon. Weak 
measurements have been investigated both 
theoretically® and experimentally’. To under- 
stand this type of measurement, assume that, 
before a measurement is started, the quantum 
state of the system under investigation is well 
known. Next, the measurement apparatus is 
turned on. The measurement signal taken 
over a short time interval is a nearly random 
quantity, by itself insufficient to infer the sys- 
tem’s state. However, the prior knowledge of 
the state combined with the tiny bit of infor- 
mation obtained from the measured signal is 
enough to fully infer the system's new quantum 
state. This process can be extended over the full 
duration of the measurement procedure; by 
using the continuous signal from the measure- 
ment apparatus, knowledge of the quantum 
state can be continuously updated. The change 
of the state over time is a quantum trajectory. 

The random nature of quantum trajectories 
reflects the fact that the change of the quantum 
state at each time in the measurement process 
depends on the measurement result, which is 


itself random. In their experiment, Weber et al. 
analyse trajectories in the following way. A set 
of trajectories is selected that is conditional 
on the initial and final states of the transmon. 
Although such a set of trajectories is random, 
the most likely one is found to provide valu- 
able information about the transmon (Fig. 1). 
The most probable path is one that reflects, on 
the one hand, the tendency of the transmon to 
settle in state 0 or 1 and, on the other hand, its 
tendency to oscillate between these two states. 

The most likely trajectory can be theoreti- 
cally calculated® by requiring that a global 
measure of the trajectory, the action, is an 
extremum — that is, insensitive with respect to 
small changes in the trajectory. This approach 
establishes an intriguing connection with 
other theories in which path optimization is 
key, such as Fermat's least-time principle for 
light propagation, the Hamilton principle 
for dynamics in classical mechanics, and also 
the formulation of quantum mechanics in 
terms of mathematical entities known as path 
integrals. 

Weber et al. have successfully measured the 
statistics of quantum trajectories for their trans- 
mon device and shown that the most probable 
trajectory is in agreement with calculations 
based on extremal action’®. An interesting par- 
allel may be drawn with classical trajectories. 
Fluorescent markers can be used to character- 
ize flow patterns or biological processes. Analo- 
gously, quantum trajectories carry information 
about the time dynamics of quantum systems. 
The use of weak measurements to determinine 
quantum trajectories can therefore provide 
information about the parameters of the system 
that generate the dynamics. Another potential 
application of weak measurements is the prepa- 
ration of quantum states. 

Further development of the experiments 
described here will have to address the fidelity 
of the measurement procedure, which has an 
efficiency of 40% in its current form. The use of 
this method for quantum parameter and state 
estimation will require rigorous investigation, 
particularly with regard to how the method 
compares with similar protocols based on 
strong quantum measurements. 
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50 Years Ago 


At the meeting of the Society 

for Visiting Scientists on June 3, 
great interest was expressed in 

the potentialities of international 
research centres. It was admitted 
from the outset that there is no 
intrinsic merit in international 
research as such, since the important 
thing in any scientific work is the 
result, not how or where the work 

is carried out. The justification for 
any proposed international effort 
must therefore be carefully examined 
... The experience of running 
CERN presented some interesting 
lessons ... CERN had been formed 
at a time when there was a sense of 
togetherness among most of the 
nations of Western Europe, a feeling 
which sought for some practical 
expression. It was important that any 
concrete form which could be given 
to it should not be controversial, 
should not be military, should not be 
a political disaster if it failed and that 
success if achieved should be clearly 
recognizable. If these factors were 
present, the way was open for the 
scientists, who, after all, led the world 
in international co-operation, to exert 
pressure on their political colleagues. 
From Nature 1 August 1964 


100 Years Ago 


Many instances are on record of 
so-called “wolf-children,’ said to 
have been found in the jungles 

of India. A strange story is now 
reported from Naini Tal, the 
summer capital of the United 
Provinces of Agra and Oudh, ofa 
female child about nine years old 
found in this neighbourhood, and 
unable to eat anything except grass 
and chapatis or native griddle cakes. 
She has a great mat of head hair and 
a thick growth on the sides of her 
face and spine. She bears marks of 
vaccination and is clearly a child 
who had, years ago, been abandoned 
or strayed into the jungle. 

From Nature 30 July 1914 
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EPIGENETICS 


Cellular memory erased 
in human embryos 


Two analyses of human eggs, sperm and early-stage embryos reveal a 
pronounced loss of DNA methylation — a molecular modification that affects 
gene transcription — after fertilization. SEE LETTERS P.606 & P6l1 


WOLF REIK & GAVIN KELSEY 


pigenetic modifications are changes to 
Be genome that can affect gene expres- 

sion without altering DNA sequence. 
Like DNA itself, certain epigenetic modifica- 
tions can be copied faithfully when cells divide, 
allowing daughter cells to retain this informa- 
tion from their parents. This ensures that gene 
expression is maintained in a stable manner 
down cell lineages. One such modification is 
methylation, whereby methyl groups are added 
to DNA. Two papers in this issue’” show that 
there is a massive loss of DNA methylation 
from most of the genome immediately after 
fertilization in human embryos. Thus, meth- 
ylation memory is erased ona global scale — 
an epigenetic reprogramming step that seems 
to be fundamental in mammals. 

DNA methylation usually represses tran- 
scription, and primarily occurs on cytosine 
bases in the dinucleotide sequence cytosine- 
guanine (CpG, where p denotes the phos- 
phate backbone of DNA, indicating that the 
nucleotides are on the same DNA strand). 
Because Watson-Crick base-pairing dictates 
that C pairs with G on complementary DNA 
strands, CpG sequences align and both strands 


Paternal DNA 


Figure 1 | Tracking the state of DNA methylation. Guo et al.' and Smith 

et al.’ investigated DNA methylation during early development of the human 
embryo. The DNA of human sperm is highly methylated, and that of eggs 
less so (sperm and egg not drawn to scale). However, once the egg has been 
fertilized, methylation is largely lost — more so from the paternal than from 
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are methylated in the same place. Therefore, 
methylation patterns can be passed on when 
cells divide, through the CpG ‘memory mod- 
ule: This inheritance of epigenetic information 
is vital in specialized cell lineages, which must 
maintain their identity as they divide — for 
example, dividing blood cells maintain their 
epigenetic identity to give rise to daughters that 
are also blood cells. 

Guo et al.' (page 606) and Smith et al.” 
(page 611) analysed genome-wide DNA meth- 
ylation in early-stage human embryos by high- 
throughput sequencing. They studied eggs, 
sperm, fertilized eggs (zygotes) and embryos 
at various stages of development, including the 
blastocyst stage, which occurs just before the 
embryo becomes implanted in the uterus, and 
a post-implantation stage. Both groups found 
that the DNA of sperm was highly methylated 
and that of eggs moderately so (much like 
mouse sperm and eggs *). However, zygotes 
and two-cell embryos had lost a large propor- 
tion of this methylation. In particular, Guo 
et al. observed marked demethylation of the 
paternal, sperm-derived genome, compared 
with more-modest demethylation of the 
maternal genome. 

At the blastocyst stage, methylation levels 


Fertilized egg 


cell 
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Blastocyst-stage 


remained low. This was true in all blastocyst 
cell types, including the cells of a structure 
called the inner cell mass, which are pluri- 
potent — they can give rise to every cell of the 
body. Previous research indicates® that epi- 
genetic memory must be erased for embry- 
onic cells to achieve pluripotency, providing a 
possible explanation for global demethylation. 
By contrast, both groups observed that, after 
implantation, when cells had begun to adopt 
tissue-specific identities, DNA methylation 
rapidly rose to a level characteristic of dif- 
ferentiated cells. After a near-total wipe-out, 
the epigenetic memory system was back in 
place (Fig. 1). 

These results, combined with those from 
mice’* and other mammals”””, suggest that 
global methylation reprogramming after 
fertilization is evolutionarily conserved. Per- 
haps this is because early-stage mammalian 
embryos undergo rapid transcriptional acti- 
vation, together with early diversification of 
cell types — factors that necessitate a transient 
pluripotent state. Nonetheless, it is remarkable 
that the demethylation kinetics of mouse and 
human embryos are so similar, given that other 
aspects of their early development are less con- 
served. For example, the major transcriptional 
activation of the embryonic genome occurs at 
the two-cell stage in mouse embryos, whereas 
in humans it takes place at the transition 
between four and eight cells. 

It is exciting that Guo and colleagues 
detected an alternative form of epigenetic 
modification called hydroxymethylation 
preferentially in the paternal genome, because 
hydroxymethylation is implicated in demeth- 
ylation in mice”. This reinforces the idea 
that major mechanisms of epigenetic repro- 
gramming are conserved in mammals. The 
studies did not address the mechanisms of 
demethylation further. Such analyses are 


Specialized cell 


the maternal genome. As the embryo begins to develop, methylation marks 
continue to be lost from the maternal genome of cells up to the blastocyst 
stage. After this stage, DNA in differentiating cells becomes remethylated, 
allowing specialized cell types to pass instructions about control of gene 
transcription to their daughters. 


challenging in human embryos, but Smith 
and co-workers have taken a first step, growing 
pluripotent embryonic stem cells derived from 
blastocyst-stage embryos in vitro, and finding 
that the cells become rapidly remethylated. 
This might be a viable system for manipulat- 
ing and so studying genome-wide methylation 
and demethylation in human embryos, as is 
possible in mice. 

Genome-wide analyses permit a detailed 
survey of distinct regions of DNA sequence 
whose function is known to be modified by 
methylation, allowing investigation of how 
they behave in the face of global demethyla- 
tion. Such regions include ‘imprinted’ genes, 
CpG-rich genetic regions called CpG islands, 
and transposons (DNA sequences that can 
move about the genome). 

Imprinted genes are those that are expressed 
preferentially from one parental chromosome 
(maternal or paternal), unlike most genes, 
which can be expressed from both chromo- 
somes. Unusually, epigenetic memory in 
imprinted sequences is retained throughout 
development. The two groups confirmed this 
in human embryos, which they found carried 
methylation memories from the embryos’ 
parents in conserved imprinted regions. 

The authors found that, in contrast to 
sperm, human eggs had hundreds of methyl- 
ated CpG islands that differed from those in 


mouse eggs” and, as a general rule, these 
maternal epigenetic marks were not well 
maintained after fertilization in the embryos 
of mice“* or humans. Perhaps this reflects a 
difference in the development of the egg in 
the two species that is no longer relevant after 
fertilization. Alternatively, some of these 
maternal epigenetic signals may be required 
only in the early embryo, and thus could con- 
tribute to species differences in imprinting, 
particularly in the placenta”. 

Transposons need to be treated with caution 
during reprogramming, because demethyla- 
tion might cause their transcriptional acti- 
vation. If they are evolutionarily ‘young’ and 
relatively unmutated, this might lead to their 
being able to move around in the genome, 
which could result in unwanted mutations. 
Guo and colleagues investigated one class of 
transposon, LINE elements, and found that 
evolutionarily young elements were more 
resistant to demethylation than their older 
counterparts. 

The new studies provide an atlas of methyla- 
tion reprogramming in early human embryos 
and hence a foundation for studying epigenetic 
regulation of human development. This is 
vital if we are to understand the epigenetic 
mechanisms that control pluripotency and 
differentiation. Such understanding will also 
help in assessing the long-term consequences 


Protein-export 
pathway illuminated 


Two studies provide evidence that the protein complex PTEX is needed for export 
of malaria- parasite proteins into the cytoplasm of infected cells, and that such 
export is essential for parasite survival. SEE LETTERS P.587 & P.592 


SANJAY A. DESAI & LOUIS H. MILLER 


alaria parasites export hundreds of 
M proteins into the red blood cells that 
they infect. These proteins increase 
nutrient uptake from blood plasma, facilitate 
adhesion of the infected cell to endothelial cells 
in blood vessels and markedly remodel the red 
blood cell for the parasite’s benefit. A parasite 
protein complex called Plasmodium trans- 
locon of exported proteins (PTEX) has been 
proposed to traffic these proteins across the 
membrane of the cellular vacuole that sepa- 
rates the parasite from the cytoplasm of the 
infected cell’. In two papers in this issue, Beck 
et al.” (page 592) and Elsworth et al.’ (page 587) 
definitively show protein transport through 
PTEX. 
Protein interaction studies’ have suggested 


that the PTEX translocon consists of five pro- 
teins and that one of these, EXP2, forms a pore 
through which proteins are threaded after they 
have been unfolded, in an energy-dependent 
process, by the chaperone protein HSP101. 
The new studies demonstrate the function 
of this protein complex in the infected cell by 
ablating the activity of HSP101 and PTEX150, 
a PTEX component with unkown function. 
Suppression of either component, which 
was achieved by transcriptional repression 
or protein destabilization, was found to pre- 
vent export of two broad categories of parasite 
protein. 

In Plasmodium falciparum, one of the main 
human-infecting malaria parasites, most 
exported proteins contain a sequence of five 
amino-acid residues, called the PEXEL motif, 
near their amino terminus*”. Cleavage within 
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of fertility interventions, including in vitro 
fertilization, for human health. = 
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this motif by the enzyme plasmepsin V com- 
mits the mature protein for export to the host 
cell, but the precise nature of the commitment 
step remains debated®. The second category 
of exported proteins lacks this motif. Such 
PEXEL-negative exported proteins, of which 
there is an expanding list, lack unifying fea- 
tures and thus have defied attempts to predict 
how they are recognized and exported’. The 
authors’ observations of inhibited export of 
proteins from both categories — which cover 
the full spectrum of export timings, protein 
biophysical properties and destinations in 
the host cell — implicates PTEX as a crucial 
bottleneck in parasite-induced remodelling of 
the red blood cell. 

These findings raise new questions about 
the translocation process. How does PTEX 
recognize and transport such diverse proteins, 
but allow other proteins to remain in the para- 
sitophorous vacuole (the parasite-containing 
compartment in the host cell that is formed 
during invasion and further modified dur- 
ing parasite growth)? Are retained proteases, 
chaperones and other enzymes not recognized 
by PTEX or do they carry specific signals that 
prevent export? To be threaded through PTEX, 
exported proteins must first be unfolded, as 
has been shown with a reporter protein that 
was prohibited from unfolding by a tightly 
bound substrate®. Whether HSP101 or other 
chaperones actually catalyse the unfolding of 
each exported protein is unknown. Also still 
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Endothelium 


PfEMP1 


Figure 1 | Transport of parasite proteins into the host red blood cell. The protein complex PTEX 
mediates export of malaria-parasite proteins across the parasitophorous vacuolar membrane (PVM), 
which separates the cytoplasm of an infected cell from the vacuole in which the parasite resides. Parasite 
proteins are first secreted into the vacuolar space, in which they are unfolded before transport through 
PTEX; they are then refolded in the host-cell cytoplasm. Some membrane proteins might undergo 

lateral transfer (dashed arrow) from PTEX into the PVM, a process that would allow movement within 
membranes to reach vesicles known as Maurer’s clefts. Exported proteins then localize to specific sites in 
the cell or on the red-cell membrane, at which they serve functions that are crucial to intracellular parasite 
growth. Examples include nutrient uptake through PSAC and PfEMP1-mediated binding of infected cells 


to endothelial cells that line blood vessels. 


unclear are if EXP2 indeed defines the pore 
and what roles the other PTEX components 
might serve. Could membrane proteins pass- 
ing through PTEX undergo lateral transfer 
into the parasitophorous vacuolar membrane 
to allow migration along membranous exten- 
sions (Fig. 1), as established for translocons 
in other organisms”? Finally, proteins that 
pass into the host-cell cytoplasm will require 
refolding, presumably by parasite chaperones 
that are also exported and must somehow be 
refolded themselves”. 

Another fundamental finding of these 
studies is that suppression of protein export 
interferes with intracellular parasite growth, 
indicating that exported proteins have essential 
roles in parasite survival. The authors observed 
adverse effects on parasite development 
in vitro and in vivo, with immature ring-stage 
parasites unable to mature to the trophozoite 
stage. By contrast, inhibiting PTEX after 
maturation to the trophozoite stage was well 
tolerated, with no effect on parasite egress 
from the cell or invasion of new red blood 
cells, suggesting that these latter processes do 
not depend on proteins exported late in the 
cycle. But development of early-stage gameto- 
cytes, the sexual stage of the parasite life cycle 
required for malaria transmission by mosqui- 
toes, was also severely compromised. 

Which activities of the numerous exported 
proteins account for the parasite growth 
inhibition seen in these studies? Although 
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binding of infected cells to endothelial 
receptors is required for parasite survival 
in vivo, it is dispensable for in vitro culture. A 
leading candidate is the uptake of nutrients by 
the plasmodial surface anion channel (PSAC), 
an essential activity associated with the para- 
site protein cytoadherence-linked antigen 3 
(CLAG3)'?. Beck and colleagues found 
that CLAG3 still enters the host-cell cytoplasm 
when PTEX is suppressed, implying that it is 
exported by a distinct mechanism, perhaps 
during invasion. At the same time, solute 
transport by PSAC was curtailed, suggesting 
that other exported proteins are required for 
nutrient-channel formation. 

In human malaria, binding of infected cells 
to the endothelium averts their destruction 
by the spleen and is primarily mediated by 
members of the P. falciparum erythrocyte 
membrane protein 1 (PfEMP1) protein family. 
Each member has multiple binding domains 
at its extracellular face and a single trans- 
membrane domain to anchor the protein over 
parasite-induced knobs on the infected cell’’. 
How does PfEMP1 move from the parasite to 
the red-blood-cell membrane? Although the 
protein is not cleaved by plasmepsin V, its 
atypical PEXEL motif and transmembrane 
domain both seem to contribute to its export’. 
Subsequent refolding of the endothelium- 
binding domains in the host cytoplasm 
presumably requires disulphide-isomer- 
ase enzymes to bring numerous cysteine 
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amino-acid residues together correctly and 
may involve a battery of chaperones’*”». 
Specialized sorting organelles known as 
Maurer’s clefts and proteins at the surface 
knobs also seem to be required for the ultimate 
insertion of PfEMP1 in the host membrane”’. 
In light of the complex folded structure of 
PfEMP1 and the possible involvement of 
many chaperones, the compromised export 
of this protein observed by both Beck et al. 
and Elsworth et al. could reflect an indirect 
effect of PTEX inhibition. Further study will 
be required to determine the precise mecha- 
nisms for trafficking and presentation of this 
key virulence factor. 

The two new articles reveal a remarkably 
broad range of substrates for the translocon 
and provide compelling evidence that pro- 
tein export is essential for the parasite and 
therefore represents a potential therapeutic 
target. We foresee that combinations of drugs 
that target both PTEX and exported parasite 
activities, such as PSAC-mediated nutrient 
uptake, may be highly synergistic antimalarial 
therapies. m 
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CORRECTION 

The final corrections (to remove mentions 
of specific chromosomes) to the Retractions 
(Nature 511, 112; 2014) were accidentally 
omitted from the print versions. The online 
versions were correct. 
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Comprehensive molecular profiling of 
lung adenocarcinoma 


The Cancer Genome Atlas Research Network* 


Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 
resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, 
methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen 
genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function 
MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female 
patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RITI occurred 
in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these 
events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by 
somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, 
when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, 
unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investi- 


gations of lung adenocarcinoma molecular pathogenesis. 


Lung cancer is the most common cause of global cancer-related mor- 
tality, leading to over a million deaths each year and adenocarcinoma is 
its most common histological type. Smoking is the major cause of lung 
adenocarcinoma but, as smoking rates decrease, proportionally more 
cases occur in never-smokers (defined as less than 100 cigarettes in a life- 
time). Recently, molecularly targeted therapies have dramatically improved 
treatment for patients whose tumours harbour somatically activated onco- 
genes such as mutant EGFR’ or translocated ALK, RET, or ROS] (refs 2-4). 
Mutant BRAF and ERBB2 (ref. 5) are also investigational targets. How- 
ever, most lung adenocarcinomas either lack an identifiable driver onco- 
gene, or harbour mutations in KRAS and are therefore still treated with 
conventional chemotherapy. Tumour suppressor gene abnormalities, 
suchas those in TP53 (ref. 6), STK11 (ref. 7), CDKN2A®, KEAP1 (ref. 9), 
and SMARCA4 (ref. 10) are also common but are not currently clinically 
actionable. Finally, lung adenocarcinoma shows high rates of somatic 
mutation and genomic rearrangement, challenging identification of all 
but the most frequent driver gene alterations because of a large burden 
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of passenger events per tumour genome’! ”*. Our efforts focused on com- 
prehensive, multiplatform analysis of lung adenocarcinoma, with atten- 
tion towards pathobiology and clinically actionable events. 


Clinical samples and histopathologic data 


Weanalysed tumour and matched normal material from 230 previously 
untreated lung adenocarcinoma patients who provided informed con- 
sent (Supplementary Table 1). All major histologic types of lung ade- 
nocarcinoma were represented: 5% lepidic, 33% acinar, 9% papillary, 
14% micropapillary, 25% solid, 4% invasive mucinous, 0.4% colloid and 
8% unclassifiable adenocarcinoma (Supplementary Fig. 1)'*. Median 
follow-up was 19 months, and 163 patients were alive at the time of last 
follow-up. Eighty-one percent of patients reported past or present smok- 
ing. Supplementary Table 2 summarizes demographics. DNA, RNA and 
protein were extracted from specimens and quality-control assessments 
were performed as described previously’. Supplementary Table 3 sum- 
marizes molecular estimates of tumour cellularity’®. 
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Figure 1 | Somatic mutations in lung 
adenocarcinoma. a, Co-mutation plot from whole 
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*A list of authors and affiliations appears at the end of the paper. 
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Somatically acquired DNA alterations 


We performed whole-exome sequencing (WES) on tumour and germ- 
line DNA, with a mean coverage of 97.6X and 95.8, respectively, as per- 
formed previously”. The mean somatic mutation rate across the TCGA 
cohort was 8.87 mutations per megabase (Mb) of DNA (range: 0.5-48, 
median: 5.78). The non-synonymous mutation rate was 6.86 per Mb. 
MutSig2CV" identified significantly mutated genes among our 230 
cases along with 182 similarly-sequenced, previously reported lung 
adenocarcinomas”. Analysis of these 412 tumour/normal pairs high- 
lighted 18 statistically significant mutated genes (Fig. 1a shows co-mutation 
plot of [CGA samples (n = 230), Supplementary Fig. 2 shows co-mutation 
plot of all samples used in the statistical analysis (n = 412) and Sup- 
plementary Table 4 contains complete MutSig2CV results, which also 
appear on the TCGA Data Portal along with many associated data files 
(https://tcga-data.nci.nih.gov/docs/publications/luad_2014/). TP53 was 
commonly mutated (46%). Mutations in KRAS (33%) were mutually 
exclusive with those in EGFR (14%). BRAF was also commonly mutated 
(10%), as were PIK3CA (7%), MET (7%) and the small GTPase gene, RIT1 
(2%). Mutations in tumour suppressor genes including STK11 (17%), 
KEAPI1 (17%), NF1 (11%), RB1 (4%) and CDKN2A (4%) were observed. 
Mutations in chromatin modifying genes SETD2 (9%), ARIDIA (7%) and 
SMARCA4 (6%) and the RNA splicing genes RBM10 (8%) and U2AF1 
(3%) were also common. Recurrent mutations in the MGA gene (which 
encodes a Max-interacting protein on the MYC pathway’) occurred in 
8% of samples. Loss-of-function (frameshift and nonsense) mutations 
in MGA were mutually exclusive with focal MYC amplification (Fisher’s 
exact test P = 0.04), suggesting a hitherto unappreciated potential mech- 
anism of MYC pathway activation. Coding single nucleotide variants and 
indel variants were verified by resequencing at a rate of 99% and 100%, 
respectively (Supplementary Fig. 3a, Supplementary Table 5). Tumour 
purity was not associated with the presence of false negatives identified 
in the validation data (P = 0.31; Supplementary Fig. 3b). 

Past or present smoking associated with cytosine to adenine (C >A) 
nucleotide transversions as previously described both in individual genes 
and genome-wide’*’’. C > A nucleotide transversion fraction showed 
two peaks; this fraction correlated with total mutation count (R* = 0.30) 
and inversely correlated with cytosine to thymine (C > T) transition fre- 
quency (R* = 0.75) (Supplementary Fig. 4). We classified each sample 
(Supplementary Methods) into one of two groups named transversion- 
high (TH, n = 269), and transversion-low (TL, n = 144). The transversion- 
high group was strongly associated with past or present smoking (P < 
2.2 X 10” '®), consistent with previous reports'®. The transversion-high 
and transversion-low patient cohorts harboured different gene mutations. 
Whereas KRAS mutations were significantly enriched in the transversion- 
high cohort (P = 2.1 X 10— 13), EGER mutations were significantly enriched 
in the transversion-low group (P = 3.3 X 10°). PIK3CA and RB1 muta- 
tions were likewise enriched in transversion-low tumours (P < 0.05). 
Additionally, the transversion-low tumours were specifically enriched 
for in-frame insertions in EGFR and ERBB2 (ref. 5) and for frameshift 
indels in RB1 (Fig. 1b). RBI is commonly mutated in small-cell lung 
carcinoma (SCLC). We found RB1 mutations in transversion-low ade- 
nocarcinomas were enriched for frameshift indels versus single nucleotide 
substitutions compared to SCLC (P < 0.05)”°”' suggesting a mutational 
mechanism in transversion-low adenocarcinoma that is probably dis- 
tinct from smoking in SCLC. 

Gender is correlated with mutation patterns in lung adenocarcinoma”. 
Only a fraction of significantly mutated genes from the complete set reported 
in this study (Fig. 1a) were enriched in men or women (Fig. 1c). EGFR 
mutations were enriched in tumours from the female cohort (P = 0.03) 
whereas loss-of-function mutations within RBM10, an RNA-binding pro- 
tein located on the X chromosome” were enriched in tumours from men 
(P = 0.002). When examining the transversion-high group, 16 out of 21 
RBM10 mutations were observed in males (P = 0.003, Fisher’s exact test). 

Somatic copy number alterations were very similar to those previ- 
ously reported for lung adenocarcinoma™ (Supplementary Fig. 5, Sup- 
plementary Table 6). Significant amplifications included NKX2-1, TERT, 
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MDM 2, KRAS, EGFR, MET, CCNE1, CCND1, TERC and MECOM (Sup- 
plementary Table 6), as previously described”*, 8q24 near MYC, anda 
novel peak containing CCND3 (Supplementary Table 6). The CDKN2A 
locus was the most significant deletion (Supplementary Table 6). Sup- 
plementary Table 7 summarizes molecular and clinical characteristics 
by sample. Low-pass whole-genome sequencing on a subset (n = 93) of 
the samples revealed an average of 36 gene-gene and gene-inter-gene 
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Figure 2 | Aberrant RNA transcripts in lung adenocarcinoma associated 
with somatic DNA translocation or mutation. a, Normalized exon level RNA 
expression across fusion gene partners. Grey boxes around genes mark the 
regions that are removed as a consequence of the fusion. Junction points of the 
fusion events are also listed in Supplementary Table 9. Exon numbers refer 
to reference transcripts listed in Supplementary Table 9. b, MET exon 14 
skipping observed in the presence of exon 14 splice site mutation (ss mut), 
splice site deletion (ss del) or a Y1003* mutation. A total of 22 samples had 
insufficient coverage around exon 14 for quantification. The percentage 
skipping is (total expression minus exon 14 expression)/total expression. 

c, Significant differences in the frequency of 129 alternative splicing events in 
mRNA from tumours with U2AFI1 S34F tumours compared to U2AF1 WT 
tumours (q value <0.05). Consistent with the function of U2AF1 in 3’ splice 
site recognition, most splicing differences involved cassette exon and 
alternative 3’ splice site events (chi-squared test, P< 0.001). 
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rearrangements per tumour. Chromothripsis* occurred in six of the 
93 samples (6%) (Supplementary Fig. 6, Supplementary Table 8). Low- 
pass whole genome sequencing-detected rearrangements appear in 
Supplementary Table 9. 


Description of aberrant RNA transcripts 

Gene fusions, splice site mutations or mutations in genes encoding splic- 
ing factors promote or sustain the malignant phenotype by generating 
aberrant RNA transcripts. Combining DNA with mRNA sequencing 
enabled us to catalogue aberrant RNA transcripts and, in many cases, 
to identify the DNA-encoded mechanism for the aberration. Seventy- 
five per cent of somatic mutations identified by WES were present in the 
RNA transcriptome when the locus in question was expressed (minimum 
5X) (Supplementary Fig. 7a) similar to prior analyses’’. Previously iden- 
tified fusions involving ALK (3/230 cases), ROS1 (4/230) and RET 
(2/230) (Fig. 2a, Supplementary Table 10), all occurred in transversion- 
low tumours (P = 1.85 X 10 “, Fisher’s exact test). 

MET activation can occur by exon 14 skipping, which results in a 
stabilized protein*®. Ten tumours had somatic MET DNA alterations 
with MET exon 14 skipping in RNA. In nine of these samples, a 5’ or 
3' splice site mutation or deletion was identified”’. MET exon 14 skip- 
ping was also found in the setting of a MET Y1003* stop codon muta- 
tion (Fig. 2b, Supplementary Fig. 8a). The codon affected by the Y1003* 
mutation is predicted to disrupt multiple splicing enhancer sequences, 
but the mechanism of skipping remains unknown in this case. 

S34F mutations in U2AFI have recently been reported in lung ade- 
nocarcinoma” but their contribution to oncogenesis remains unknown. 
Eight samples harboured U2AF1°**". We identified 129 splicing events 
strongly associated with U2AF1°*** mutation, consistent with the role of 
U2AF1 in 3’-splice site selection”. Cassette exons and alternative 3’ splice 
sites were most commonly affected (Fig. 2c, Supplementary Table 11)”. 
Among these events, alternative splicing of the CTNNB1 proto-oncogene 
was strongly associated with U2AFI mutations (Supplementary Fig. 8b). 
Thus, concurrent analysis of DNA and RNA enabled delineation of 
both cis and trans mechanisms governing RNA processing in lung 
adenocarcinoma. 
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Candidate driver genes 


The receptor tyrosine kinase (RTK)/RAS/RAF pathway is frequently 
mutated in lung adenocarcinoma. Striking therapeutic responses are 
often achieved when mutant pathway components are successfully inhib- 
ited. Sixty-two per cent (143/230) of tumours harboured known activating 
mutations in known driver oncogenes, as defined by others”. Cancer- 
associated mutations in KRAS (32%, n = 74), EGFR (11%, n = 26) and 
BRAF (7%, n = 16) were common. Additional, previously uncharac- 
terized KRAS, EGFR and BRAF mutations were observed, but were not 
classified as driver oncogenes for the purposes of our analyses (see Sup- 
plementary Fig. 9a for depiction ofall mutations of known and unknown 
significance); explaining the differing mutation frequencies in each gene 
between this analysis and the overall mutational analysis described above. 
Wealso identified known activating ERBB2 in-frame insertion and point 
mutations (n = 5)°, as wellas mutations in MAP2K1 (n = 2), NRAS and 
HRAS (n = leach). RNA sequencing revealed the aforementioned MET 
exon 14 skipping (nm = 10) and fusions involving ROS1 (n = 4), ALK 
(n = 3) and RET (n = 2). We considered these tumours collectively as 
oncogene-positive, as they harboured a known activating RTK/RAS/ 
RAF pathway somatic event. DNA amplification events were not con- 
sidered to be driver events before the comparisons described below. 

We sought to nominate previously unrecognized genomic events that 
might activate this critical pathway in the 38% of samples without a 
RTK/RAS/RAF oncogene mutation. Tumour cellularity did not differ 
between oncogene-negative and oncogene-positive samples (Supplemen- 
tary Fig. 9b). Analysis of copy number alterations using GISTIC” identified 
unique focal ERBB2 and MET amplifications in the oncogene-negative 
subset (Fig. 3a, Supplementary Table 6); amplifications in other wild-type 
proto-oncogenes, including KRAS and EGFR, were not significantly 
different between the two groups. 

We next analysed WES data independently in the oncogene-negative 
and oncogene-positive subsets. We found that TP53, KEAP1, NF1 and 
RIT1 mutations were significantly enriched in oncogene-negative tumours 
(P < 0.01; Fig. 3b, Supplementary Table 12). NF1 mutations have previ- 
ously been reported in lung adenocarcinoma”, but this is the first study, 
to our knowledge, capable of identifying all classes of loss-of-function 
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Figure 3 | Identification of novel candidate driver genes. a, GISTIC analysis 
of focal amplifications in oncogene-negative (n = 87) and oncogene-positive 
(n = 143) TCGA samples identifies focal gains of MET and ERBB2 that are 
specific to the oncogene-negative set (purple). b, TP53, KEAP1, NF1 and RIT1 
mutations are significantly enriched in samples otherwise lacking oncogene 
mutations (adjusted P< 0.05 by Fisher’s exact test). c, Co-mutation plot of 
variants of known significance within the RTK/RAS/RAF pathway in lung 
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adenocarcinoma. Not shown are the 63 tumours lacking an identifiable driver 
lesion. Only canonical driver events, as defined in Supplementary Fig. 9, and 
proposed driver events, are shown; hence not every alteration found is 
displayed. d, New candidate driver oncogenes (blue: 13% of cases) and known 
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pathway can be found in the majority of the 230 lung adenocarcinomas. 
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NF 1 defects and to statistically demonstrate that NF1 mutations, as well 
as KEAP1 and TP53 mutations are enriched in the oncogene-negative 
subset of lung adenocarcinomas (Fig. 3c). All RIT] mutations occurred 
in the oncogene-negative subset and clustered around residue Q79 (homol- 
ogous to Q61 in the switch II region of RAS genes). These mutations 
transform NIH3T3 cells and activate MAPK and PI(3)K signalling”, 
supporting a driver role for mutant RIT1 in 2% of lung adenocarcinomas. 
This analysis increases the rate at which putative somatic lung adeno- 
carcinoma driver events can be identified within the RTK/RAS/RAF 
pathway to 76% (Fig. 3d). 
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Recurrent alterations in key pathways 


Recurrent aberrations in multiple key pathways and processes charac- 
terize lung adenocarcinoma (Fig. 4a). Among these were RTK/RAS/ 
RAF pathway activation (76% of cases), PI(3)K-mTOR pathway activa- 
tion (25%), p53 pathway alteration (63%), alteration of cell cycle regu- 
lators (64%, Supplementary Fig. 10), alteration of oxidative stress pathways 
(22%, Supplementary Fig. 11), and mutation of various chromatin and 
RNA splicing factors (49%). 

We then examined the phenotypic sequelae of some key genomic 
events in the tumours in which they occurred. Reverse-phase protein 
arrays provided proteomic and phosphoproteomic phenotypic evidence 
of pathway activity. Antibodies on this platform are listed in Supplemen- 
tary Table 13. This analysis suggested that DNA sequencing did not 
identify all samples with phosphoprotein evidence of activation of a 
given signalling pathway. For example, whereas KRAS-mutant lung ade- 
nocarcinomas had higher levels of phosphorylated MAPK than KRAS 
wild-type tumours had on average, many KRAS wild-type tumours dis- 
played significant MAPK pathway activation (Fig. 4b, Supplementary 
Fig. 10). The multiple mechanisms by which lung adenocarcinomas 
achieve MAPK activation suggest additional, still undetected RTK/RAS/ 
RAF pathway alterations. Similarly, we found significant activation of 
mTOR and its effectors (p70S6kinase, S6, 4E-BP1) in a substantial frac- 
tion of the tumours (Fig. 4c). Analysis of mutations in PIK3CA and 
STK11, STK11 protein levels, and AMPK and AKT phosphorylation” 
led to the identification of three major mTOR patterns in lung adeno- 
carcinoma: (1) tumours with minimal or basal mTOR pathway activa- 
tion, (2) tumours showing higher mTOR activity accompanied by either 
STK11-inactivating mutation or combined low STK11 expression and 
low AMPK activation and (3) tumours showing high mTOR activity 
accompanied by either phosphorylated AKT activation, PIK3CA muta- 
tion, or both. As with MAPK, many tumours lack an obvious underlying 
genomic alteration to explain their apparent mTOR activation. 


Molecular subtypes of lung adenocarcinoma 


Broad transcriptional and epigenetic profiling can reveal downstream 
consequences of driver mutations, provide clinically relevant classifica- 
tion and offer insight into tumours lacking clear drivers. Prior unsuper- 
vised analyses of lung adenocarcinoma gene expression have used varying 
nomenclature for transcriptional subtypes of the disease***’. To coor- 
dinate naming of the transcriptional subtypes with the histopathological**, 
anatomic and mutational classifications of lung adenocarcinoma, we 
propose an updated nomenclature: the terminal respiratory unit (TRU, 
formerly bronchioid), the proximal-inflammatory (PI, formerly squa- 
moid), and the proximal-proliferative (PP, formerly magnoid)” transcrip- 
tional subtypes (Fig. 5a). Previously reported associations of expression 
signatures with pathways and clinical outcomes*****? were observed (Sup- 
plementary Fig. 7b) and integration with multi-analyte data revealed 
statistically significant genomic alterations associated with these tran- 
scriptional subtypes. The PP subtype was enriched for mutation of KRAS, 
along with inactivation of the STK11 tumour suppressor gene by chro- 
mosomal loss, inactivating mutation, and reduced gene expression. In 
contrast, the PI subtype was characterized by solid histopathology and 


Figure 4 | Pathway alterations in lung adenocarcinoma. a, Somatic 
alterations involving key pathway components for RTK signalling, mTOR 
signalling, oxidative stress response, proliferation and cell cycle progression, 
nucleosome remodelling, histone methylation, and RNA splicing/processing. 
b, c, Proteomic analysis by RPPA (n = 181) P values by two-sided t-test. 

Box plots represent 5%, 25%, 75%, median, and 95%. PP, proximal 
proliferative; TRU, terminal respiratory unit; PI, proximal inflammatory. 

c, mTOR signalling may be activated, by either Akt (for example, via PI(3)K) or 
inactivation of AMPK (for example, via STK11 loss). Tumours were separated 
into three main groups: those with PI(3)K-AKT activation, through either 
PIK3CA activating mutation or unknown mechanism (high p-AKT); those 
with LKB1-AMPK inactivation, through either STK11 mutation or unknown 
mechanism with low levels of LKB1 and p-AMPK; and those showing none 
of the above features. 
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Figure 5 | Integrative analysis. a-c, Integrating unsupervised analyses of 230 
lung adenocarcinomas reveals significant interactions between molecular 
subtypes. Tumours are displayed as columns, grouped by mRNA expression 
subtypes (a), DNA methylation subtypes (b), and integrated subtypes by 


co-mutation of NF1 and TP53. Finally, the TRU subtype harboured the 
majority of the EGFR-mutated tumours as well as the kinase fusion express- 
ing tumours. TRU subtype membership was prognostically favourable, 
as seen previously™* (Supplementary Fig. 7c). Finally, the subtypes exhib- 
ited different mutation rates, transition frequencies, genomic ploidy pro- 
files, patterns of large-scale aberration, and differed in their association 
with smoking history (Fig. 5a). Unsupervised clustering of miRNA 
sequencing-derived or reverse phase protein array (RPPA)-derived data 
also revealed significant heterogeneity, partially overlapping with the 
mRNA-based subtypes, as demonstrated in Supplementary Figs 12 and 13. 

Mutations in chromatin-modifying genes (for example, SMARCA4, 
ARIDIA and SETD2) suggest a major role for chromatin maintenance 
in lung adenocarcinoma. To examine chromatin states in an unbiased 
manner, we selected the most variable DNA methylation-specific probes 
in CpG island promoter regions and clustered them by methylation inten- 
sity (Supplementary Table 14). This analysis divided samples into two 
distinct subsets: a significantly altered CpG island methylator phenotype- 
high (CIMP-H(igh)) cluster and a more normal-like CIMP-L(ow) group, 
with a third set of samples occupying an intermediate level of methy- 
lation at CIMP sites (Fig. 5b). Our results confirm a prior report*® and 
provide additional insights into this epigenetic program. CIMP-H tumours 
often showed DNA hypermethylation of several key genes: CDKN2A, 
GATA2, GATA4, GATAS, HICI, HOXA9, HOXD13, RASSF1, SFRP1, 
SOX17 and WIF1 among others (Supplementary Fig. 14). WNT pathway 
genes are significantly over-represented in this list (P value = 0.0015) 
suggesting that this is a key pathway with an important driving role 
within this subtype. MYC overexpression was significantly associated 
with the CIMP-H phenotype as well (P = 0.003). 

Although we did not find significant correlations between global DNA 
methylation patterns and individual mutations in chromatin remodel- 
ling genes, there was an intriguing association between SETD2 mutation 
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iCluster analysis (c). All displayed features are significantly associated with 
subtypes depicted. The CIMP phenotype is defined by the most variable CpG 
island and promoter probes. 


and CDKN2A methylation. Tumours with low CDKN2A expression 
due to methylation (rather than due to mutation or deletion) had lower 
ploidy, fewer overall mutations (Fig. 5c) and were significantly enriched 
for SETD2 mutation, suggesting an important role for this chromatin- 
modifying gene in the development of certain tumours. 

Integrative clustering of copy number, DNA methylation and mRNA 
expression data found six clusters (Fig. 5c). Tumour ploidy and mutation 
rate are higher in clusters 1-3 than in clusters 4-6. Clusters 1-3 frequently 
harbour TP53 mutations and are enriched for the two proximal tran- 
scriptional subtypes. Fisher's combined probability tests revealed signi- 
ficant copy number associated gene expression changes on 3q in cluster 
one, 8q in cluster two, and chromosome 7 and 15q in cluster three (Sup- 
plementary Fig. 15). The low ploidy and low mutation rate clusters four 
and five contain many TRU samples, whereas tumours in cluster 6 have 
comparatively lower tumour cellularity, and few other distinguishing 
molecular features. Significant copy number-associated gene expres- 
sion changes are observed on 6q in cluster four and 19p in cluster five. 
The CIMP-H tumours divided into a high ploidy, high mutation rate, 
proximal-inflammatory CIMP-H group (cluster 3) and a low ploidy, low 
mutation rate, TRU-associated CIMP-H group (cluster 4), suggesting that 
the CIMP phenotype in lung adenocarcinoma can occur in markedly 
different genomic and transcriptional contexts. Furthermore, cluster 
four is enriched for CDKN2A methylation and SETD2 mutations, sug- 
gesting an interaction between somatic mutation of SETD2 and deregulated 
chromatin maintenance in this subtype. Finally, cluster membership 
was significantly associated with mutations in TP53, EGFR and STK11 
(Supplementary Fig. 15, Supplementary Table 6). 


Conclusions 


Weassessed the mutation profiles, structural rearrangements, copy number 
alterations, DNA methylation, mRNA, miRNA and protein expression 
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of 230 lung adenocarcinomas. In recent years, the treatment of lung 
adenocarcinoma has been advanced by the development of multiple 
therapies targeted against alterations in the RTK/RAS/RAF pathway. We 
nominate amplifications in MET and ERBB2 as well as mutations of 
NF1 and RIT1 as driver events specifically in otherwise oncogene-negative 
lung adenocarcinomas. This analysis increases the fraction of lung ade- 
nocarcinoma cases with somatic evidence of RTK/RAS/RAF activation 
from 62% to 76%. While all lung adenocarcinomas may activate this 
pathway by some mechanism, only a subset show tonic pathway acti- 
vation at the protein level, suggesting both diversity between tumours 
with seemingly similar activating events and as yet undescribed mech- 
anisms of pathway activation. Therefore, the current study expands the 
range of possible targetable alterations within the RTK/RAS/RAF path- 
way in general and suggests increased implementation of MET and 
ERBB2/HER2 inhibitors in particular. Our discovery of inactivating 
mutations of MGA further underscores the importance of the MYC 
pathway in lung adenocarcinoma. 

This study further implicates both chromatin modifications and splic- 
ing alterations in lung adenocarcinoma through the integration of DNA, 
transcriptome and methylome analysis. We identified alternative splic- 
ing due to both splicing factor mutations in trans and mutation of splice 
sites in cis, the latter leading to activation of the MET gene by exon 14 
skipping. Cluster analysis separated tumours based on single-gene driver 
events as well as large-scale aberrations, emphasizing lung adenocarci- 
noma’s molecular heterogeneity and combinatorial alterations, includ- 
ing the identification of coincident SETD2 mutations and CDKN2A 
methylation in a subset of CIMP-H tumours, providing evidence of a 
somatic event associated with a genome-wide methylation phenotype. 
These studies provide new knowledge by illuminating modes of geno- 
mic alteration, highlighting previously unappreciated altered genes, and 
enabling further refinement in sub-classification for the improved per- 
sonalization of treatment for this deadly disease. 


METHODS SUMMARY 


All specimens were obtained from patients with appropriate consent from the rele- 
vant institutional review board. DNA and RNA were collected from samples using 
the Allprep kit (Qiagen). We used standard approaches for capture and sequencing of 
exomes from tumour DNA and normal DNA” and whole-genome shotgun sequenc- 
ing. Significantly mutated genes were identified by comparing them with expectation 
models based on the exact measured rates of specific sequence lesions. GISTIC 
analysis of the circular-binary-segmented Affymetrix SNP 6.0 copy number data was 
used to identify recurrent amplification and deletion peaks*'. Consensus clustering 
approaches were used to analyse mRNA, miRNA and methylation subtypes using 
previous approaches’*. The publication web page is (https://tcga-data.nci.nih.gov/ 
docs/publications/luad_2014/). Sequence files are in CGHub (https://cghub.ucsc.edu/). 
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Topoisomerase II mediates meiotic 
crossover interference 


Liangran Zhang’, Shunxin Wang", Shen Yin', Soogil Hong’, Keun P. Kim? & Nancy Kleckner! 


Spatial patterning is a ubiquitous feature of biological systems. Meiotic crossovers provide an interesting example, 
defined by the classic phenomenon of crossover interference. Here we identify a molecular pathway for interference 
by analysing crossover patterns in budding yeast. Topoisomerase II plays a central role, thus identifying a new function 
for this critical molecule. SUMOylation (of topoisomerase II and axis component Red1) and ubiquitin- mediated removal 
of SUMOylated proteins are also required. The findings support the hypothesis that crossover interference involves 
accumulation, relief and redistribution of mechanical stress along the protein/DNA meshwork of meiotic chromosome 
axes, with topoisomerase II required to adjust spatial relationships among DNA segments. 


During meiosis, crossovers promote genetic diversity and create physical 
connections between homologues that ensure their accurate segregation 
(reviewed in refs 1-3). Crossovers arise stochastically from a larger set of 
undifferentiated precursor recombination complexes, having different 
positions in different nuclei. Nonetheless, along any given chromosome 
in any given nucleus, crossovers tend to be evenly spaced (reviewed in refs 
3 and 4). This genetic phenomenon is termed crossover interference®®. 

Crossover interference implies the occurrence of communication 
along chromosomes, over distances ranging from 300 nm to more than 
30 um (refs 4, 7 and 8). Some models for crossover interference invoke 
spreading of a molecular-based change along the chromosomes’. Even 
spacing can also be achieved by a reaction—diffusion process'’. We have 
proposed, alternatively, that interference involves the accumulation, 
relief and redistribution of mechanical stress, with spreading molecular 
changes following as a consequence of spreading stress relief*. Aberrant 
crossover patterns are observed in mutants defective for recombination, 
chromosome structure, chromatin state and DNA-based signal trans- 
duction. However, no specific molecular process has been defined. To 
address this deficit, we examined crossover patterns in wild-type (WT) 
and mutant strains of budding yeast as defined by cytological local- 
ization of crossover-correlated molecular foci. 


Crossover interference in WT meiosis 


Mammals, plants and fungi share a common meiotic recombination 
program. Recombination initiates by programmed double-strand breaks 
(DSBs), which occur as chromosome structural axes develop''’””. Each 
DSB identifies its homologous partner duplex and mediates whole chro- 
mosome pairing. As a result, homologue structural axes are co-aligned, 
linked by bridging recombination complexes’. Crossover patterning is 
thought to act upon these bridging interactions'*“, designating a subset 
to be crossovers, with accompanying interference’*”*. In yeast, crossover 
designation locally nucleates formation of synaptonemal complex between 
homologue axes'*’*”°, Synaptonemal complex then spreads along the 
lengths of the chromosomes. Correspondingly, crossover patterning and 
interference are independent of synaptonemal complex formation’*’””* 
(below). 

In yeast, foci of E3 ligase Zip3, which specifically mark the sites of 
patterned crossovers*'*, serve as an early marker for crossover inter- 
ference analysis (Methods). Zip3 foci emerge immediately following 
crossover designation, thus avoiding complications arising during 


formation of actual crossover products*. Also, Zip3 foci do not mark 
the sites of additional crossovers that arise by other routes® (Methods). 

Zip3-MYC foci were visualized along synaptonemal complexes of 
surface-spread pachytene chromosomes by wide-field epifluorescence® 
(Fig. 1a, b). Each Zip3 focus position was defined, to an accuracy of 
approximately one pixel (67 nm) along a particular marked chromo- 
some in ~200-300 nuclei, thus defining patterns with a high degree of 
reproducibility and accuracy* (Supplementary Table 1). Using these 
position data, the distance along a chromosome over which the inter- 
ference signal is detectable (that is, the ‘interference distance’, L) is 
defined by three different approaches (Fig. 1c-e). In each case, L is 
given in units of physical distance (rationale below), micrometres of 
synaptonemal complex, which is a proxy for chromosome length at late 
leptotene when crossover designation actually occurs (above). 

Crossover interference is classically described by coefficient of coincid- 
ence (CoC) analysis*** (Fig. 1c; see Methods). Zip3 foci along three chro- 
mosomes of different sizes (330-1,530 kilobases (kb)) exhibit classic 
coefficient of coincidence relationships (Fig. 1d, left column). For intervals 
that are close together, bivalents exhibiting a focus in each interval (‘double 
events’) are much rarer than expected, reflecting operation of interference; 
as the inter-interval distance increases, double-event frequencies progres- 
sively approach, then reach, that expected for independent occurrence, 
where the observed frequency is the same as the expected frequency (co- 
efficient of coincidence = 1). At even longer intervals, coefficient of coin- 
cidence values can exceed 1, reflecting the tendency for even spacing*. For 
convenience, we define the interference distance described by such curves 
as the inter-interval distance at which the coefficient of coincidence = 0.5, 
namely Leoc (Fig. 1d, left column). The three analysed chromosomes 
exhibit virtually identical coefficient of coincidence curves and values of 
Looc = 0.3 + 0.01 pum® (n = 2-4; Fig. 1d, left column; Methods). 

We previously described a stress-and-stress relief mechanism for 
crossover patterning (the ‘beam-film’ (BF) model). Beam-film-predicted 
crossover patterns are defined by simulation analyses* that can accur- 
ately describe crossover patterns in diverse organisms, including yeast 
(Fig. 1d, middle and right columns). The beam-film parameter (L) is the 
distance over which the interference signal spreads along the chromo- 
somes and corresponds to the distance at which the predicted coefficient 
of coincidence = 0.5, namely Lgp. Beam-film simulations give the same 
value of L and Lpp ~ 0.3 um for all three analysed yeast chromosomes 
(Fig. 1d, middle column). 
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Crossover interference can be examined by a modified coefficient 
of coincidence analysis (MCoC”, Fig. le; Methods). The three ana- 
lysed yeast chromosomes exhibit the same average Lyscoc of ~0.3 pm. 


Crossover interference requires topoisomerase II 

Topoisomerase II (Topol]) alleviates topological stresses within chro- 
mosomes. If crossover interference involves mechanical stress along 
the chromosomes’, Topoll could be a key player. We assessed cross- 
over interference in three mutants with altered Topoll states (Fig. 2 
and Extended Data Figs 1-3). (1) Topoll was depleted using a pCLB2- 
TOP2 fusion that expressed Topoll in vegetative cells but not meiosis. 
(2) Topoll catalytic activity was eliminated in meiosis by expressing a 
catalytically inactive allele (top2YF) under its native promoter in a 
pCLB2-TOP2 strain, leaving top2YF as the only gene expressed during 
meiosis. (3) SUMOylation of Topoll at several carboxy (C)-terminal 
residues” was eliminated by mutation. All three top2 mutant strains 
grow well vegetatively, progress to the pachytene stage of meiosis and 
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Figure 1 | Crossover interference in WT meiosis. a, Spread yeast pachytene 
chromosomes labelled for synaptonemal complex component Zip1 (red), 
crossover-correlated Zip3 foci (green) and a lacO/LacI-GFP (green fluorescent 
protein) array at the end of chromosome XV (blue). b, Positions of Zip3 foci 
along a chromosome XV bivalent and total synaptonemal complex were 
determined in a single continuous trace. c, Definition of coefficient of 
coincidence. CO, crossover. d, Coefficient of coincidence and average number/ 
distribution of crossovers on chromosomes (Chr) XV, II and IV (black). Bars, 
s.e.m. Beam-film best-fit simulations in green. Legc and Lg = 0.3 + 0.01 um 
for all three chromosomes (n = 4, 3 and 2 experiments, 200-300 bivalents 
each). e, Modified coefficient of coincidence analysis defines, for each interval, 
the number of adjacent intervals affected by crossover interference. Top left: 
each interval is considered individually as a reference interval (Ref). 
Chromosomes that do or do not contain a crossover in that interval (CO*R, 
CO’ p) are evaluated for the number that do or do not contain a crossover in a 
second (nearby) interval (Test; CO*;, CO”). Fisher’s exact test is applied to 
determine whether there were fewer crossovers in the CO“, group versus the 
CO group, implying interference emanating from the reference interval to 
the test interval. Top right: number of nearby test intervals where interference 
was detected in one direction from the reference interval gives Lyycoc for that 
interval. Bottom: average Lyicoc for all reference intervals along a chromosome 
(0.16 jum per interval): Lygcoc ~ 0.3 um for all three chromosomes (Extended 
Data Fig. 3a). 


exhibit normal synaptonemal complex morphology and length” 
(Extended Data Fig. 3). Meiotic Topoll levels and localization were 
severely reduced in pCLB2-TOP2 and not detectably changed in other 
mutants (Extended Data Fig. 1). 

In all three fop2 mutant strains, for all three analysed chromo- 
somes, the interference distance as defined by Looc: Lgr and Lyycoc 
decreased from ~0.3 j1m in WT to ~0.2 um (Fig. 2a, b and Extended 
Data Figs 2 and 3). Reduced interference should be accompanied by 
an increased number of crossovers, and, in all cases, the distribution 
of Zip3 foci per bivalent was shifted to higher values (Fig. 2a, b and 
Extended Data Fig. 2). 

For pCLB2-TOP2, the interference defect was confirmed by a fourth 
approach. Meiotic crossover patterns are characterized by ‘crossover 
homeostasis*. A decrease or increase in the frequency of DSBs (and 
thus crossover precursor interactions) necessarily changes the cross- 
over frequency. However, the magnitudes of such changes are less than 
proportional to the change in DSB/precursor frequency, implying a 
homeostatic effect. Crossover homeostasis is a direct consequence of 
crossover interference***: homeostatic disparity is greater or less when 
crossover interference is stronger or weaker, and absent when crossover 
interference is absent. This interplay is predicted, and can be quantified, 
by beam-film simulations® (Fig. 2d). 

To evaluate crossover homeostasis experimentally, the number of 
Zip3 foci along a given chromosome was determined in a series of 
strains that exhibited different levels of DSBs (precursors). Decreased 
and increased levels are conferred by hypomorphic mutations in DSB 
transesterase Spoll and a tel1A mutation respectively* (Extended Data 
Fig. 4 and Fig. 2d). In a TOP2 background, homeostasis is apparent in 
the nonlinear relationship of Zip3 focus number to DSB number (chro- 
mosomes XV and III; Fig. 2d, filled black circles, and Extended Data Fig. 4)°, 
Moreover, the experimentally defined relationships occur at exactly the 
level of interference predicted to occur in WT meiosis by best-fit beam- 
film simulation analysis* (Lgp ~ 0.3 um; above; Fig. 2d). 

If pCLB2-TOP2 reduces the interference distance, it should bring 
the relationship between Zip3 focus number and DSB number closer 
to the linear proportionality seen in the absence of interference. This 
prediction is fulfilled (chromosomes XV and II; Fig. 2d filled pink 
circles and Extended Data Fig. 4). Furthermore, the mutant relation- 
ships again occur specifically at the interference distance predicted by 
best-fit beam-film simulation analysis for this mutant (Lpp ~ 0.2 [tm; 
Fig. 2d and Extended Data Fig. 4). These results confirm the existence 
of an interference defect in pCLB2-TOP2 and provide further evid- 
ence that the beam-film model can accurately describe crossover pat- 
terns (see also Extended Data Fig. 4). 
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Crossover interference requires SUMO and STUbL 
SUMOpylation of Topoll requires Ubc9, the only known SUMO-E2 of 
yeast”. Another Ubc9 substrate is meiotic axis component Red] (ref. 
20). Mutation of the SUMOylation patch of Red1, which dramatically 
reduces the level of modification (red1KR”*), confers the same altered 
Zip3 focus patterns as top2 mutations, including top2SNM (Fig. 3a and 
Extended Data Fig. 3). Interestingly the non-null allele, ubc9-GFP”, also 
exhibits this phenotype (Fig. 3a and Extended Data Fig. 3), as well as an 
elevated level of crossovers as defined genetically”. 

Crossover interference also requires STUbL protein S1x5/8. Slx5/8 
ubiquitinates SUMOylated proteins, targeting them for removal from 
their cognate complexes”*. Absence of SIx5/8 activity confers a strong 
global increase in protein SUMOylation during meiosis (Extended 
Data Fig. 5). Absence of either SIx5 or SIx8, or mutational abrogation 
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Figure 2 | Crossover interference in top2 mutants. a, b, All three top2 
mutants show decreased crossover interference by all criteria (Lcoc, Lys 
Lucoc) and correspondingly increased crossover frequency. a, WT, top2 and 
beam-film (BF) simulation data (black, pink and green). c, The basis for 
crossover homeostasis*. CO, crossover; NCO, non-crossover. At lower (higher) 
precursor density (black vertical lines; left (right)), a given precursor will be less 
(more) likely to experience interference emanating from nearby crossovers 
(indicated by fewer (more) blue lines), giving an increased (decreased) 
probability of a crossover at each individual position, and thus along the whole 
chromosome length. The magnitudes of these effects will be greater or lesser 
according to the strength of crossover interference (and zero in its absence). 
d, Quantitative evaluation of crossover homeostasis on chromosome XV. Lines: 
relationship of crossover number to precursor number (parameter N) predicted 
by beam-film simulations at varying interference levels (Lgr = interference 
distance, L; other parameters appropriate to WT yeast meiosis*). Crossover 
homeostasis decreases with decreasing crossover interference. Filled circles: 
strains exhibiting altered DSB levels (top) were analysed for Zip3 foci in TOP2 
(black) and pCLB2-TOP2 (pink) backgrounds (Extended Data Fig. 4). Average 
frequency of Zip3 foci per bivalent plotted versus DSB ( = precursor) number 
(vertical lines indicate s.d.). pCLB2-TOP2 differs experimentally from WT in 
the direction expected for decreased crossover interference. Experimental data 
for WT and pCLB2-TOP2 both quantitatively match the relationships predicted 
for their corresponding interference levels by beam-film simulations (Lgp = 0.3 
and 0.2 uum, respectively; Fig. 2a, b). 


of either the SxS SUMO-binding motif or the Slx8 ubiquitin ligase 
motif (sIx5A, slx8A, sIx5-SIM or slx8-SS), confers the same changes 
in Zip3 focus patterns as top2, red1l-KR and ubc9-GFP (Fig. 3b and 
Extended Data Figs 2 and 3). The s/x5A defect is confirmed genetically 
(Extended Data Fig. 3 and Supplementary Table 2). 

The sirtuin, Sir2, enables S1x5/8 STUbL activity” and is required for 
crossover interference via that activity. The absence of Sir2 (sir24) or 
specific elimination of the interaction of Sir2 with Slx5/8 (sir2RK) con- 
fer the same changes in Zip3 focus patterns as all of the other mutations 
analysed above, by all criteria (Fig. 3c and Extended Data Figs 2 and 3). 
The interference defect in sir2RK was confirmed genetically (Extended 
Data Fig. 3 and Supplementary Table 2). 

The role of Sir2 in interference is specific to this one function. 
Elimination of other Sir2-mediated activities (histone deacetylase cata- 
lysis (sir2-345); interaction with Sir2 partners required for silencing 
(deletion mutants of Sir3, Sir4, Esc2 and Esc8); cohesion (sir24C500)) 
does not alter crossover interference (Fig. 3c and Extended Data Figs 3 
and 6). 


A single Topoll crossover interference pathway 

Not only do all analysed mutants exhibit the same quantitative defects 
in crossover interference and crossover number as defined by Zip3 
focus patterns (Figs 2-4 and Extended Data Figs 2 and 3), but double 
mutants carrying combinations of single mutations also exhibit these 
same phenotypes (Fig. 4a,b). Thus, the described mutants define a 
single molecular pathway. 

This pathway may directly implement the spreading interference 
signal, but other perturbations are not excluded (Supplementary Dis- 
cussion). These results cannot be explained by (1) prolongation of the 
crossover-designation period, (2) higher DSB/precursor levels (Extended 
Data Figs 4, 7 and 8) or (3) obviously altered axis organization, since all 
mutants exhibit WT synaptonemal complex lengths (Extended Data Fig. 3). 
All mutants exhibit reduced evenness of spacing as defined by gamma 
distribution analysis (Supplementary Discussion). 


The obligatory crossover does not require interference 

Since a crossover is required for meiotic homologue segregation, every 
pair of homologues must acquire at least one (the ‘obligatory crossover’)’. 
The frequency of zero-Zip3 focus chromosomes is less than 10° for 
chromosomes IV and XV and ~1% for chromosome III because it is 
small*. None of the identified interference-defective mutants exhibits 
an increased frequency of zero-Zip3 foci chromosomes (Figs 1-4 and 
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Figure 3 | Crossover interference requires post-translational modification. 
a-c, WT and mutant crossover patterns (black; colours). Quantitatively similar 
decreases in crossover interference and increases in crossover number are seen in 
ubc9-GFP (SUMO E2; brown), red1KR (non-SUMOylated Red1; cyan), strains 
lacking Slx5 or SIx8 (slx5A or slx84) or mutated for the SxS SUMO-binding 


Extended Data Fig. 2). This result argues against models in which crossover 
interference is required to ensure the obligatory crossover*” whereas the 
beam-film model predicts this phenotype’. 


The crossover interference metric is physical distance 
We analysed Zip3 focus patterns in strains whose pachytene synap- 
tonemal complex lengths differ from those of the reference WT SK1 
strain (Fig. 5 and Extended Data Fig. 9). These strains exhibit different 
interference distances when the metric used is genomic length (kilo- 
bases) but exactly the same (WT) interference distance when the metric 
is physical length as micrometres of synaptonemal complex (Fig. 5; 
compare top and bottom panels). Beam-film simulations give the same 
relationships (Extended Data Fig. 9a-c). Thus, in budding yeast, the 
metric for spreading crossover interference is physical chromosome dis- 
tance, as in mouse, Arabidopsis, human and tomato***’, Differences in 
synaptonemal complex length probably result from altered chromatin 
loop lengths (kilobases) without a change in basic axis structure’**. In all 
cases, experimental Zip3 focus distributions are matched by beam-film 
simulations that use the WT value for interference distance (Lp). These 
and other details (Extended Data Fig. 9 legend) provide further evidence 
of the precision with which the beam-film model explains diverse cross- 
over patterns. 


The Topoll interference pathway is highly specific 

None of more than 20 other examined mutants exhibit altered Zip3 
focus patterns, including those with the following: (1) altered axis com- 
position (condensin, pch2A); (2) lacking either a sister chromatid (cdc6) 
or any/normal synaptonemal complex (zip14; msh4A)*"* (Fig. 5; dis- 
cussion in Extended Data Fig. 9a and Methods); or (3) deleted for Sir2 
relative, Hst1; ATM homologue Tel1; meiotic telomere/motion protein 
Ndj1; chromodomain protein Dot1; DSB-triggered y-H2A; Topoll-co- 
localizing Nse1/Smc5/6; nucleosome density factor Yta7; Mph1, Mlh1/3 
and Mms4 (recombination resolution); or Msh2 (mismatch repair) 
(Extended Data Fig. 6; L.Z., unpublished observations). 


Discussion 


Our findings show that Topoisomerase II is essential for normal CO inter- 
ference. Further, crossover interference is mediated by communication 
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motif or the Slx8 ubiquitin ligase motif (slx5-SIM, slx8-SS) (magenta), or lacking 
Sir2 (sir24) or mutated for the Sir2/Slx5 interaction site (sir2RK) (blue). 
Crossover interference does not require Sir2 deacetylation activity (sir2-345), 
Sir2 interaction partner Sir4 (sir4) (grey) or other Sir2 activities/partners (text). 


along prophase chromosome structural axes (Fig. 6a). The Topoll 
interference pathway involves SUMOylation of Red1l, a prominent 
meiotic axis component. Topol] itself occurs prominently along mei- 
otic prophase axes, in yeast and mammals**”* and along the structural 
axes of mammalian mitotic late-stage chromosomes, to which meiotic 
axes are related*’. Moreover, the Topoll interference pathway requires 
SUMOylation of Topol] as well as Red1. In mitotic mammalian cells, 
SUMOylated Topoll is implicated in late-stage chromosome structural 
axes**, and in yeast, SUMOylated Topoll occurs preferentially in cen- 
tromere regions* which, during meiosis, mimic crossover-designation/ 
interference sites by nucleating synaptonemal complex formation’. 
Spreading of interference along the axis matches our finding that the 
relevant metric is physical chromosome distance and the inference that 
variations in synaptonemal complex length in different mutants result 
from variations in loop length rather than basic axis structure. Finally, 
spreading along the axis explains how the interference signal is first 
generated by, then sensed by, biochemical recombination complexes, 
which are intimately embedded in the axes from their first inception as 
pre-DSB ensembles’’. Notably, the meiotic prophase axis probably 
comprises a meshwork of DNA segments joined by linker proteins'**”” 
(Fig. 6a, b). 

Most importantly, crossover interference requires the catalytic 
activity of Topoll. Since Topoll activity does not require input of 
external energy from ATP hydrolysis, its reactions must be driven 
forward, and given directionality, by their substrates, which are chan- 
ged by Topoll from a higher potential energy state to a lower potential 
energy state. If substrate for TopolI during crossover interference is 
the axis meshwork (above), that meshwork is first placed in a high 
potential energy state and then, in response to crossover designation, 
undergoes relaxation, dependent upon Topoll activity. That is, the 
axis meshwork begins in a mechanically stressed state and is then 
relaxed to a less mechanically stressed state dependent upon Topoll. 
This progression closely matches the proposed stress and stress relief 
mechanism for crossover patterning*’* (Methods): stress accumulates 
along the chromosomes and provokes local crossover designation 
which, by its intrinsic nature, results in local relief of stress. That local 
change then redistributes along the chromosomes, emanating out- 
wards from its nucleation site, reducing stress and thereby disfavouring 
additional stress-promoted crossover designations in the affected regions. 
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Figure 4 | A single pathway for crossover interference. a, Representative 
double mutants and component single mutants exhibit the same quantitative 
defect in crossover interference and increased crossover number (colours and 
black) versus WT (dashed line). b, Crossover interference and crossover 
number phenotypes for all mutants (Figs 1-5). WT (black), sir2-345 and sir4A 
(grey), top2 mutants (pink), sir24 and sir2RK (blue), six5/8 mutants (purple), 
ubc9-GFP (brown), red1KR (light blue), double mutants (red); mutants 

with altered axis length showing WT phenotype (green). 
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In this context, what is the source of meshwork stress and how does 
Topol] alleviate that stress? We previously suggested that mechanical 
stress arises from axis-constrained global chromatin expansion; cross- 
over designation and interference then involve local nucleation and 
spreading of chromatin/axis compaction*"* (Fig. 6b). TopolI could act 
during compaction to adjust spatial relationships among DNA segments 
within the axis meshwork (Fig. 6b), thereby implementing both local 
relief of stress and its redistribution. The stress-relief role of Topoll is 
thus specifically targeted to the compaction process, and thus to regions 
undergoing crossover designation/interference. This role also explains 
why the Topoll pathway is important, but not absolutely essential, for 
crossover interference: in its absence, the basic process of spreading 
stress relief would occur, but full relaxation would not be possible with- 
out meshwork readjustment (Fig. 6b). Interestingly, mitotic chromo- 
somes are constrained by topologically sensitive linkages and collapse 
upon removal of protein/DNA links**”, exactly as expected for a mesh- 
work under expansion stress. 

We further note that the beam-film model, formulated to quantita- 
tively describe the predictions ofa stress and stress-relief mechanism**, 
accurately and quantitatively describes diverse crossover patterning 
data for WT meiosis, including crossover homeostasis, in yeast and 
other organisms® (Figs 1d and 2d), as well as crossover patterning in 
mutants. These include the following: (1) crossover interference, cross- 
over number and crossover homeostasis in mutants defective in the 
Topoll interference pathway (Fig. 2a, d; not shown); (2) crossover 
patterns at varying DSB levels in those mutants (Extended Data Fig. 4); 
and (3) crossover patterns in mutants with altered axis lengths (Extended 
Data Fig. 9a, b). Recent findings in Caenorhabditis elegans” can also be 
directly explained by such a model (Supplementary Discussion). Impor- 
tantly, however, the mathematical formulation of the beam-film model 
can equivalently describe any mechanism involving progressive ‘event 
designation’ and resulting interference that decays exponentially away 
from the designation site. Thus, proof that crossover patterning involves 
macroscopic mechanical effects requires direct identification of such 
effects. 

Finally, our results implicate SUMOylation (of Red1 and Topoll, 
probably among multiple targets) and ubiquitin-targeted removal of 
SUMOrylated proteins in the Topoll crossover interference pathway. 
These effects presumably act sequentially on the same molecules, which 
are first specifically SUMOylated and then targeted for removal via 
STUDL activity. SUMOylation might establish preconditions for cross- 
over interference whose subsequent implementation would require 
removal of those SUMOylated proteins. Alternatively, SUMOylation 
and STUDbL activity might compete actively in a single aspect of the 
patterning process; or SUMOylation might function only to target pro- 
tein removal. For yeast Topoll, absence of SUMOylation (in top2SNM) 
decreases the mobility of chromosome-bound Topoll”, perhaps pro- 
moting repeated cycles of Topoll catalytic activity. 


a Sew <= b c Figure 5 | The metric of crossover 
Strain brn cdc6 BR WT BR zip1A BR msh4A interference is physical 

SC length 1.25 0.9 0.88 0.72 0.64 chromosomal length (micrometres). 
g : ee i ‘ a, b, Coefficient of coincidence 

8 197 ] 0.4 = 180 8 relationships for strains with different 
8 ] Fe=0.21 9 z FP = 0.94 axis lengths (red) relative to WT SK1 
80 130 2600 180 2600 130 2600 130 2600 190 260 5 ga| ee acSine oo > & 8 o (black). a, Interference lengths differ 
9 Inter-interval distance (kb) g o = 3¢ 6 Be in genomic distance (kilobases) (top) 
5 4s ~ Serr ~  §8 Lae but are the same in physical distance 
3 ng if ] 0.2 aie <3 - (micrometres of synaptonemal 

5 os ( | ( 4.5 3 15 4.5 3 15 

g A : | tA] Axis length (um) Axis length (um) complex (SC)) (bottom). b, Leoc 


‘ T 
0.8 0 04 080 O04 08 0 
Inter-interval distance (um) 


0 04 08 0 0.4 


values from a with corresponding 
linear regression lines. ¢, Zip3 focus 
frequencies vary linearly with bivalent 
axis/synaptonemal complex length. 


31 JULY 2014 | VOL 511 | NATURE | 555 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Chromatin-axis 
meshwork 


Chromatin 


(DAPI) 5 um 


Contraction | 
{ Topoll catalysis 


Figure 6 | Proposed role of Topoll for crossover interference. 

a, Chromosomes at the crossover-designation stage (late leptotene), visualized 
in Sordaria, suggest that the axis (identified with Spo76-GFP) incorporates a 
significant fraction of chromatin (stained with DAPI) in a DNA/protein 
structural meshwork. b, Model. Top: global chromatin expansion within the 
structural axis meshwork is constrained by meshwork tethers, giving an 
expanded, mechanically stressed meshwork state. Bottom: spreading 
interference creates a more contracted state with resulting reduction in 
mechanical meshwork stress. Full implementation of contraction, and thus 
maximal spreading of interference, requires readjustment of spatial 
relationships among component DNA segments which, comprising 
topologically closed domains, require TopolI-mediated duplex/duplex 
passages (yellow stars). 


METHODS SUMMARY 


Analysed yeast strains were isogenic SK1 derivatives (Extended Data Table 1). Details 
of analyses are described in ref. 8, Methods and Extended Data. Zip3 focus positions 
and synaptonemal complex lengths for all experiments are in Supplementary Table 1. 
Data for BR strains (Fig. 5) were provided by J. Fung. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Strains. Yeasts strains are isogenic derivatives of SK1 (Extended Data Table 1) 
except for BR strains (Fig. 5), for which Zip2 foci data were provided by J. Fung 
(ref. 18). 

Pachytene Zip2/Zip3 foci mark the sites of patterned (‘interfering’) cross- 
overs. In budding yeast, as in many organisms, the majority of crossovers arise 
as the consequence of the programmed patterning process characterized by cross- 
over interference. However, a minority of crossovers arise in some other way. The 
two types of crossover are referred to as ‘patterned’, ‘class I’ or ‘interfering’, and as 
‘class IT or ‘non-interfering’, respectively. We prefer to avoid the terms ‘interfer- 
ing’ and ‘non-interfering’ for reasons discussed below. 

There are a total of approximately 90 crossovers per yeast nucleus per round of 
meiosis as defined both by microarray and genetic analyses***°. Mutant analysis 
suggests that the patterned (class I) crossovers constitute about 70% of total cross- 
overs (estimates range from 60% to 90% in different studies; see, for example, refs 
47,48). Approximately 70% of about 90 total crossovers implies about 63 patterned 
(class I) crossovers per nucleus. 

Zip2/3 foci appear to specifically mark the sites of patterned (class I) crossovers 
by several criteria, as follows. 

First, there are approximately 65 foci of Zip2, Zip3 and Msh4/5 on yeast pachytene 
chromosomes per nucleus, and these different types of focus are highly co-localized 
with one another, implying that they mark the same specific set of recombinational 
interactions®*"’’*!. These foci also co-localize with DSBs formation/repair compo- 
nents, for example Mre11 and Rad51/Dmcl, implying that they mark the sites of 
recombinational interactions (see, for example, refs 18, 19, 50; L.Z., unpublished 
observations). The number of these foci corresponds well with the predicted number 
of patterned crossovers (above). Furthermore, crossover levels defined genetically 
co-vary with the number of Zip2/3 and Msh4/5 foci in mutants examined, for 
example sgs14, tellA and spoll hypomorphs, implying that they represent an 
important majority of recombinational interactions (refs 24, 52, 53 and this study). 
Additionally, Zip2/3 and Msh4/5 have all been implicated specifically in maturation 
of patterned/interfering crossovers (see, for example, refs 8, 18, 19, 44, 50, 51). 

Second, Zip2 and Zip3 foci exhibit robust interference as shown both by coef- 
ficient of coincidence relationships for random adjacent pairs of intervals and by 
full coefficient of coincidence relationships along specific individual chromosomes 
(refs 8, 18 and this study). Also, the number of Zip3 foci shows crossover home- 
ostasis as defined in strains with altered DSB levels (refs 8, 53 and this study), where 
homeostasis is dependent upon the presence of crossover interference (refs 8, 24 
and this study). In contrast to Zip2/3 foci, total crossovers show much weaker 
interference’. 

Third, our beam-film model can accurately explain total crossover patterns 
(including coefficient of coincidence relationships and the event distribution for 
total crossovers) by assuming that Zip2/3 foci mark the sites of patterned (class I) 
crossovers; that class II crossovers represent ~30% of total crossovers; and, fur- 
thermore, that class II crossovers arise from the interactions that are ‘leftover’ after 
the operation of crossover designation and interference’. These ‘leftover’ interac- 
tions are usually matured without exchange of flanking markers; that is, to ‘non- 
crossover’ products. However, as proposed in ref. 15 and modelled in our analysis, 
these interactions may sometimes proceed to a crossover outcome instead ofa non- 
crossover outcome, thus giving class II crossovers. Such a mixture of non-crossovers 
and a few crossovers would make the outcome for leftover meiotic interactions 
similar to the outcome of mitotic DSB repair. 

Wealso note that the term ‘non-interfering’ is misleading when applied to class 
II recombinational interactions. In budding yeast, as in several (possibly all) other 
organisms, total recombinational interactions tend to be evenly spaced along each 
bivalent®. Asa result, not only will patterned/class I crossovers exhibit interference, 
so too will total interactions and class II crossovers; moreover, class II crossovers 
will interfere with patterned (class I) crossovers*. 

Fourth, both Zip2 and Zip3 foci occur specifically on the association sites 
between homologues in zip14 chromosomes'*”*. Analysis of Zip2 foci reveals 
that they exhibit interference*'*. Moreover, they exhibit the same level of inter- 
ference along zip1A chromosomes as along WT chromosomes when the metric of 
interference is physical distance (Fig. 5). 

We note that this robust cytological interference contrasts with the fact that, by 
genetic analysis, crossover interference is significantly compromised in a zip14 
mutant (see, for example, refs 44, 54). It also can be noted that cytological and 
genetic studies were performed in different strain backgrounds (BR at 30°C and 
SK1 at 30 °C, respectively). This is because (1) in BR at 30 °C, zip14 chromosomes 
are well formed to permit cytological analysis but meiosis arrests during pro- 
phase, thus precluding genetic analysis of recombination outcomes, whereas (2) 
in SK1 at 30 °C, zip14 chromosomes are less well formed, thus making cytological 
analysis more difficult, whereas meiosis does not arrest, thus permitting genetic 
analysis. 
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One possible explanation for the absence of genetic interference in the zip14 
mutant can be excluded. In principle, crossover designation and interference 
might occur normally and then be followed by a crossover-specific ‘maturation 
defect’; that is, a defect in the probability that designated interactions will actually 
mature to detectable crossovers. This progression is not acceptable because, in 
such a situation, the detectable crossovers that do manage to form will still exhibit 
normal interference®. By contrast, a diagnostic maturation effect can be seen in an 
mlh1A mutant*”. 

Two other, not mutually exclusive, explanations for absence of genetic inter- 
ference in zip1A can be suggested, as follows. 
¢ In WT meiosis, crossover interference is fundamentally a structure-based pro- 
cess to which DNA events are biochemically coupled as a downstream consequence. 
By this view, Zip] would not be required for local ‘crossover designation’ and 
interference at the structural level but would be required either (1) to set up coupling 
between crossover/non-crossover decisions and biochemical events and/or (2) to 
transduce the structural interference signal into the appropriate biochemical out- 
come. It appears that crossover designation is a specifically programmed outcome 
and interactions that are not crossover-designated mature instead to non-crossovers 
as the default option’***. It further appears that some of these ‘non-crossover-fated’ 
interactions may actually mature into crossover products, thus giving the ‘non- 
patterned’ crossovers that are not marked by Zip3 foci*. Thus, in possibility (1), all 
interactions might progress to the ‘non-crossover’ outcome, giving an increase in 
non-crossovers and some crossovers as well, with those crossovers exhibiting the 
same distribution as total precursor interactions. This is, in fact, the phenotype 
observed at the HIS4LEU2 hot spot in SK1 zip1A at 30°C". In possibility (2), 
crossover/non-crossover differentiation would occur at the biochemical level but 
there would be no progression of crossover-fated interactions. This is, in fact, the 
phenotype observed at the HIS4LEU2 hot spot in SK] zip1A at 33 °C (ref. 14). 
¢ A reduction in the frequency of mature patterned (class I) crossovers might be 
accompanied by an increase in the frequency of crossovers from other sources, for 
example occurrence of additional DSBs, some of which then give rise to cross- 
overs”*. Attempts to model this situation with beam-film simulations suggest that 
the level of extra events required to confer the strong defect in crossover inter- 
ference observed in zip1A is very high (L.Z., unpublished observations). Thus, this 
effect may contribute to, but not be the sole basis for, absence of crossover 
interference in zip14. 

Fifth, localization of Zip3 along yeast chromosomes has been evaluated mole- 
cularly by chromatin immunoprecipitation (ChIP) analysis*’. This analysis iden- 
tifies peaks and valleys of Zip3 abundance, genome wide, at different times of 
meiosis, and relates the positions of those peaks to peaks of Rec8 and Red1l 
(markers for chromosome axes at mid-prophase) and to peaks corresponding 
to DSB sites (marked by single-stranded (ss)DNA in a dmclA strain). Zip3 is 
initially most prominent at centromere regions. This localization, which corre- 
sponds to the early leptotene Zip1 centromere association seen cytologically, is 
independent of DSB formation; it is prominent at t = 3 h, about the time of DSB 
formation; and it mostly disappears by t = 5h, the time of pachytene when Zip3 
foci are assayed here. Correspondingly, we find no tendency for Zip3 foci to occur 
at centromeres in pachytene (L.Z., unpublished observations). At t= 4 and 5h, 
Zip3 appears in co-localization with chromosome axis markers and DNA DSB 
sites. Axis-localization slightly precedes DSB site localization and remains high 
while DSB site localization increases prominently, apparently in correlation with 
post-crossover-designation crossover-specific events. It is very difficult to make 
any relationship between ChIP results and cytological focus analysis for several 
reasons. (1) ChIP analysis looks at a population average localization, not a per- 
nucleus localization. (2) At t = 4h, most cells are in leptotene/zygotene, which we 
do not examine cytologically. Moreover, even at t = 5h, only ~50% of cells are in 
pachytene. Thus, ChIP data include significant signals from irrelevant stages. (3) 
The resolution of ChIP analysis is ~1-5 kb, with axis-association sites tending to 
alternate with DSB sites at separations of 5-10 kb (refs 11, 57). In contrast, Zip3 
foci extend ~300 nm along the chromosome (0.3 + 0.06 Lm; n = 320), which 
corresponds to ~90 kb in the present study (average for chromosomes III, IV and 
XV). Thus, a single Zip3 focus can encompass multiple axis association and DNA 
DSB sites. Correspondingly, ChIP analysis may well be detecting sub-focus level 
alterations within a crossover-designated region that reflect changes in the intim- 
ate molecular crosslinkability of Zip3 molecules to different types of DNA seg- 
ment without any change in the position of the associated Zip3 focus. For 
example, the finding of more prominent ChIP localization to DSB sites in 
mutants that progress farther into recombination may reflect the extent to which 
those sequences are no longer buried within earlier recombination complexes. (4) 
To complicate matters further, it is clear cytologically that a low level of Zip3 
localizes all along pachytene chromosome axes beyond that present in prominent 
foci. This general background will be detected in ChIP analysis but not by Zip3 
focus analysis. 
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Visualization and definition of synaptonemal complex lengths and Zip3 focus 
positions (additional details in ref. 8). Meiotic time courses and sample pre- 
paration. Appropriately pre-grown cell cultures were taken through synchronous 
meiosis by the SPS method**””, with meiosis initiated by transfer of cells to spor- 
ulation medium (t= 0). Cells were harvested at t~4—-5h, the time at which 
pachytene cells are most abundant (constituting approximately 50% of all cells). 
Harvested cells were spheroplasted to remove the cell wall and then re-suspended 
in MES wash (1 M sorbitol, 0.1 M MES, 1 mM EDTA, 0.5 mM MgCl pH 6.5). Cells 
were then lysed and spread on a glass microscope slide with 1% Lipsol (LIP) and 
fixed by 3% w/v paraformaldehyde with 3.4% w/v sucrose as described in ref. 60. 

Fluorescence visualization. Glass slides with spread nuclei were incubated at 
room temperature for 15 min in 1X Tris buffered saline (TBS) buffer (25 mM 
Tris-Cl, pH 8, 136 mM NaCl, 3 mM KCl) then blocked with 1x TBS buffer with 
1% w/v bovine serum albumin (BSA) for 10 min. Chromosomes in spread nuclei 
were then stained with appropriate antibodies. Primary antibodies were mouse 
monoclonal anti-myc (for detection of Zip3-Myc), goat polyclonal anti-Zip1 
(Santa Cruz) and rabbit polyclonal anti-GFP, diluted 1:1,000 in 1X TBS with 
1% BSA. Secondary antibodies were anti-mouse, anti-goat and anti-rabbit IgG 
labelled with Alexa Fluor 488, 594 or 555 (Molecular Probes), respectively; all 
were diluted 1:1,000 in 1x TBS with 1% BSA. Slides were mounted in Prolong 
Gold antifade (Molecular Probes). For condensin mutants and spol1 hypo- 
morphs with very low DSB levels, Zip1 staining was less bright than in WT, so 
axes were usually visualized by immunostaining of Rec8-3HA with rat anti-HA 
primary antibody and anti-rat labelled with Alexa Fluor 647 or 594 secondary 
antibody. Control experiments confirmed that the same synaptonemal complex 
lengths and Zip3 focus numbers/distributions/coefficient of coincidence relation- 
ships were obtained with either Zip1 or Rec8 staining. Stained chromosome 
spreads were visualized on an Axioplan IEmot microscope (Zeiss) using appro- 
priate filters. Images were collected using Metamorph (Molecular Devices) image 
acquisition. 

Defining Zip3 focus positions and synaptonemal complex lengths. Images for 
Zip3, Zip1 (or Rec8) and LacO/LacI-GFP staining (text Fig. 1a, b) were merged 
and aligned. The GFP-marked chromosome was analysed in nuclei where it was 
unambiguously separated from other chromosomes. The segmented line-tracing 
tool of Image J software (National Institutes of Health) was used. Each trace was 
initiated at the centre of the GFP focus, which typically falls beyond the end of the 
synaptonemal complex (white line in Fig. 1b). The trace was continued following 
the path of the Zip1 (Rec8) signal for the entire length of the chromosome. As the 
trace encountered a position judged (by eye) to be the centre of a Zip3 focus, that 
position was annotated using the ‘mark position’ function (control M). By appli- 
cation of the ‘zoom’ function, the annotated position of each Zip3 focus could be 
defined at the one-pixel level (~0.067 j1m under our microscope). The distal end 
of the Zip1 (Rec8) signal was also annotated. Synaptonemal complex length was 
given by the annotated position mark at the end of the trace. Importantly, by this 
approach, each Zip3 focus (and the value for total synaptonemal complex length) 
was subject to its own positioning error (evaluated below) with no accumulation of 
error along the trace. 

Accuracy of Zip3 focus (synaptonemal complex length) positions. The accu- 
racy of the results obtained by the above approach was evaluated in several ways. 
(1) Coefficient of coincidence curves are highly reproducible in multiple experi- 
ments of the same strain, as shown by the correspondence of coefficient of coin- 
cidence values among different chromosomes (Fig. 1d) and for four independent 
analyses of a single chromosome’. (2) The intensity of Zip3 can be determined 
quantitatively along the trace and the positions of intensity peaks compared with 
the positions of foci defined by eye. The two methods give virtually identical 
results except that the eye can distinguish a significant number (~5%) of foci 
that are not, or less, obvious in the trace (for example, as shoulders on major 
peaks). (3) To determine the precision with which each focus position (or each 
synaptonemal complex length) is defined in a given trace, chromosome XV was 
traced six times in each of four nuclei. The four bivalents exhibited four Zip3 foci 
(one case) or five Zip3 foci (three cases). The variation in the absolute position ofa 
given focus (or synaptonemal complex length) among a set of six duplicate traces 
ranged from 0 to 0.14 j1m with an average of 0.08 pm (80 nm). Furthermore, for 
each focus among six traces, the standard deviation of this variation ranged from 
0.02 to 0.04 um. In summary, the absolute position of each Zip3 focus (or total 
synaptonemal complex length) for a given traced bivalent is specified with an 
accuracy of approximately one pixel (67 nm). 

Wealso performed reconstruction experiments to assess the possible effects of 
one-pixel accuracy on coefficient of coincidence curves. For four WT and two 
pCLB2-TOP2 experimental data sets, independently, Zip3 focus positions were 
subjected to computational ‘adjustment’, with the position of each focus moved 
by one pixel in one direction or the other, randomly for different foci. The 
coefficient of coincidence curve was then re-calculated. The values of Lcoc were 


not changed (0.3 + 0.01 1pm before and after ‘adjustment’; further discussion of 
the accuracy of the coefficient of coincidence curves below). There were very 
subtle changes in the shape of the coefficient of coincidence curve. However, 
the nature of these changes in fact suggests that the relationships from the posi- 
tion-randomized data set represent a degradation of the more robust interference 
relationships observed in the primary data. (1) At smaller inter-interval distances 
(<0.2 jum), coefficient of coincidence values are slightly higher. This is expected 
by the fact that randomized movement will artificially increase the fraction of 
closer-together focus pairs. (2) At larger inter-interval distances, coefficient of 
coincidence values fail to rise above one. This is expected because randomized 
movement will reduce the tendency for the inter-focus position to exhibit a node 
at the most likely inter-crossover position(s) (further explanation in next section). 
Analysis of Zip3 focus (crossover) patterns: coefficient of coincidence and 
modified coefficient of coincidence relationships. Coefficient of coincidence 
relationships (see, for example, Fig. 1d). The coefficient of coincidence analysis 
is the classic indicator of crossover interference®’. If done correctly (with a suffi- 
ciently large number of intervals) with a sufficiently large data set, coefficient of 
coincidence curves provide a highly accurate description of crossover patterns 
(discussion in ref. 8). We note that, in contrast, mathematical analysis of ‘even- 
ness’ by application of the gamma distribution, while ‘model-independent’, can 
give a misleading impression with respect to mutant phenotypes or other types of 
variation (discussion in ref. 8). For example, a defect in maturation of crossovers 
after their positions have been designated has no effect on interference and thus 
does not affect coefficient of coincidence relationships but significantly alters the 
value of the gamma ‘evenness’ parameter. Coefficient of coincidence curves for 
Zip3 foci were obtained using the “Analyze crossover data’ feature of the beam- 
film program, using as an input the experimentally defined positions of Zip3 foci 
in a given experiment’. For this purpose, chromosomes are divided into a number 
of intervals with equal size (detailed discussions in ref. 8 protocol $1). For each 
interval the total frequency of Zip3 foci in the set of chromosomes examined is 
determined. Then, for each pair of intervals, the observed frequency of chromo- 
somes exhibiting a Zip3 focus in both intervals (referred to for convenience as 
‘double crossovers’) is determined. This value defines the frequency of ‘observed 
double crossovers’. If crossovers (Zip3 foci) arise independently in each interval, 
the predicted frequency of double crossovers for a given pair of intervals should be 
the product of the frequencies of crossovers (Zip3 foci) in the two intervals 
considered individually. This product is the frequency of ‘expected double cross- 
overs’. The coefficient of coincidence for that particular pair of intervals is the 
ratio of these two frequencies, that is the observed/expected ratio for that interval 
pair. A coefficient of coincidence curve is obtained by considering all possible 
pairs of intervals, with the coefficient of coincidence value for each pair plotted as 
a function of the distance between (the midpoints of) the two corresponding 
intervals. For a classic coefficient of coincidence curve, at very small inter-interval 
distance, the coefficient of coincidence is close to zero, indicating very strong cross- 
over interference. As the inter-interval distance increases, the coefficient of coincid- 
ence also gradually increases, indicating that crossover interference decreases with 
increased inter-interval distance. Eventually, the coefficient of coincidence value 
reaches one, implying that, at the corresponding inter-interval distance, crossover 
interference no longer has any influence. At certain specific larger inter-interval 
distances, the coefficient of coincidence value tends to be greater than one, implying 
that, at these distances, there is a higher probability of double crossovers than pre- 
dicted on the basis of independent occurrence. Nodes of coefficient of coincidence 
greater than 1 tend to occur at inter-interval distances that correspond approxi- 
mately to the average inter-crossover distance and multiples thereof (see ref. 8 for 
more examples). This pattern reflects the fact that operation of crossover interference 
tends to create an evenly spaced array of crossovers (Zip3 foci, in this analysis). 
For convenience, the inter-interval distance at which the coefficient of coin- 
cidence = 0.5 is defined as Lcoc and can be used as a measurement for ‘crossover 
interference strength’, by which is meant the effective distance over which cross- 
over interference acts. Importantly, at a mechanistic level, variations in Legc can 
result from variations in features other than the distance over which the interfer- 
ence signal spreads (for example, as discussed for beam-film simulations below). 
Values of Leoc are highly reproducible from one experiment to another. For the 
three analysed chromosomes in WT meiosis, values for individual experiments 
and the averages and standard deviations are as follows: chromosome XV, 0.31, 0.3, 
0.32, 0.32 (0.31 + 0.01; n = 4); chromosome III, 0.31, 0.32, 0.3 (0.31 + 0.01; n = 3); 
chromosome IV, 0.31, 0.32 (0.32 + 0.01;n = 2). Further documentation is in ref. 8. 
Modified coefficient of coincidence analysis (Fig. le). As an alternative approach 
to evaluating the effective interference distance, we adapted the ‘modified coef- 
ficient of coincidence’ approach previously described for analysis of genetic cross- 
over data*’. For the present purpose, each interval is used as a reference (Ref; Fig. le 
top left). Chromosomes are then divided into two groups: those with or without a 
crossover (Zip3 focus) in this reference interval (CO*, or CO x). Another nearby 
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interval is then selected as a test (Test (T)). For each reference group (CO*, or 
CO x), the numbers of chromosomes with and without a crossover in this test 
interval is determined (CO* ;and CO 7). Ifcrossover levels are lower in the CO*p 
group than in the CO” xz group, the presence of a crossover in the reference interval 
has reduced the probability of a crossover in the Test interval; that is, interference 
emanating from the reference interval has been felt in that Test interval. When this 
evaluation is performed for all intervals in the vicinity of a given reference interval, 
it reveals the distance over which interference extends outward from that interval, 
giving Lyicoc for that reference interval (Fig. le, top right). Determination of 
Loicoc Values for all intervals along each of the three analysed chromosomes gives 
an average Lyicoc for that chromosome (Fig. le, bottom right). 

This analysis requires an evaluation, for each comparison between a reference 

interval and a test interval, of whether the relative frequencies of COt;andCO + 
chromosomes are the same for the CO‘, and CO” xz groups or different (that is, 
lower in the CO‘ group). For this purpose, Fisher’s exact test was applied. Since 
interference is stronger (and thus more likely to be statistically significant) at 
shorter distances, the more stringent the probability specified by Fisher’s exact 
test, the shorter the inferred ‘interference distance’. The standard criterion for sig- 
nificance by this method is P < 0.05. By this criterion, Lyycoc for the three analysed 
chromosomes in WT meiosis was 0.3 um, which is the same as Legc as defined 
above. With a more stringent criterion, P < 0.01, Lyscocis slightly shorter (0.25 jum). 
Importantly, mutants with decreased interference distance always showed decreased 
Lyicoc compared with WT regardless of whether the standard, or more stringent, 
criterion was applied. Thus, when P < 0.05, Lmicoc in top2 mutants versus WT was 
1.3 intervals versus 1.9 intervals (that is, 0.2 um versus 0.3 1m); when P< 0.01, 
Luicoc in top2 mutants versus WT was 1.0 versus 1.5 in WT (that is, 0.16 um versus 
0.25 um). Given that P < 0.05 is the standard value applied for Fisher’s exact test and 
the fact that Logo and Lycoc correspond at P< 0.05, we adopted this level of 
stringency to describe Lyicoc in the present analysis (Figs 1, 2 and 4 and Extended 
Data Fig. 3). 
Beam-film simulations. The beam-film model and the program used for simula- 
tions are described in detail in refs. 4 and 8. The beam-film program was recently 
rewritten in MATLAB (R2010a), which is downloadable at https://app.box.com/ 
s/hv91q2nrtq0cp9n8iy9m. 

Outline of the beam-film model. An array of precursor interactions comes 
under global stress, which causes a first (most sensitive) precursor to go critical, 
undergoing a stress-promoted change that commits it to becoming a crossover 
(‘crossover designation’). The intrinsic effect of this change will be a local reduc- 
tion in the level of stress at the site of the change. To even out distribution of stress 
along the chromosome, the initial local reduction in stress then redistributes 
outwards in both directions, thus reducing the probability that any subsequent 
crossover designation(s) will occur in the affected region. This effect constitutes 
crossover interference. Assuming that the system does not comprise a single 
elastic component, the extent of stress reduction will dissipate with increasing 
distance away from the nucleation site, becoming negligible over a characteristic 
distance (corresponding to the ‘interference distance’). A second crossover des- 
ignation may then occur. If so, that crossover will occur preferentially at a position 
that retains a high stress level and thus preferentially at some distance away from 
the position of the prior crossover designation. This second crossover designation 
will again result in local stress relief and redistribution (and thus interference), 
giving a new stress landscape along the chromosome. If/as additional events 
occur, they will tend to fill in the holes between prior events, thus giving an evenly 
spaced array. The beam-film model predicts the number and array of crossovers 
that will occur in particular system with particular mechanical properties that are 
analogous to a known system in the physical world (the ‘beam-film system’). In 
this particular system, the magnitude of the stress reduction decreases exponen- 
tially with distance away from its nucleation point. 

Beam-film best-fit simulations. In beam-film simulation analysis, the para- 
meters of the beam-film model are varied to define the constellation of parameter 
values at which the predicted array of crossover events best matches that observed 
experimentally for a particular data set*. As described in detail elsewhere’®, the 
parameters to be specified fall into three categories that describe, respectively, the 
following: (1) the array of precursor interactions upon which crossover pattern- 
ing acts; (2) the nature of the patterning process per se; and (3) the probability that 
a crossover-designated interaction will actually mature to an experimentally 
detectable crossover or crossover marker (that is, a Zip3 focus). 

For modelling, the level of global stress is progressively increased up to a 
maximum specified level (Sax). As the level of stress increases, precursors will 
undergo crossover designation sequentially in relation to their relative local stress 
levels at that moment in the sequence of events (differently for different bivalents 
according to their specific histories). Each crossover designation triggers reduc- 
tion in stress, in both directions, over a characteristic length given by a specific 
parameter (L). The value of L for a particular simulation is directly reflected in the 
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resultant coefficient of coincidence relationships and corresponds very closely to 
the inter-interval distance at which the coefficient of coincidence = 0.5, defined 
here as Lpp. A third patterning parameter (‘A’) describes precursor reactivity: that 
is, the way in which the probability of crossover designation varies as a function of 
the local stress level at the corresponding position. A fourth patterning parameter 
(‘clamping’) permits adjustment of crossover probabilities near chromosome 
ends. 

Parameter values for beam-film best-fit simulations of crossovers (Zip3 foci) 

along WT yeast chromosomes are described in ref. 8. The best-fit simulations for 
mutant patterns presented in Figs 2a, 3a—c and 4a, b (except mutants with altered 
axis lengths) were obtained using these same parameter values except that the 
value of L was appropriately reduced, from ~0.3 um to ~0.2 um, resulting in a 
commensurate reduction in Lpp. Best-fit simulations in situations with altered 
DSB levels (Fig. 2d) also involved changes in the number of precursors (N), as 
discussed below (‘crossover homeostasis analysis’) and in Extended Data Fig. 4. 
Best-fit simulations in mutants with altered axis lengths also involved changes in 
the number of precursors (N), as discussed in Extended Data Fig. 9. 
Crossover homeostasis analysis. Crossover homeostasis is a nonlinear relation- 
ship between the number of DSBs and the number of crossovers**. The existence 
and magnitude of crossover homeostasis depends on the existence and strength of 
crossover interference (see text and ref. 8). 

Beam-film simulations of crossover homeostasis. A beam-film best-fit simu- 
lation predicts the number of crossovers that will occur if crossover designation 
and interference occur according to a specific set of values for involved parameters. 
To get a simulated crossover homeostasis curve under a particular set of condi- 
tions, multiple beam-film simulations were performed at different values of the 
precursor number N, which were varied over a desired range, and with the values of 
all other parameters held constant. The average numbers of crossovers predicted 
for each evaluated value of N were then plotted as a function of N. Such curves were 
then obtained analogously at different values for the interference distance L (ref. 8; 
Fig. 2d). 

Experimental evaluation of crossover homeostasis by Zip3 focus analysis. The 
positions of Zip3 foci were determined along specific marked chromosomes (XV 
and III) ina series of strain backgrounds known to give varying levels of DSBs, in 
both a TOP2 and a pCLB2-TOP2 background. Coefficient of coincidence relation- 
ships and the numbers and distributions of Zip3 foci per bivalent for all strains are 
given in Figs 1 and 2 and Extended Data Figs 2 and 4. Average Zip3 focus numbers 
per chromosome (average + s.d.) are shown in Fig. 2d and listed in the legend to 
Extended Data Fig. 4. 

DSB levels were decreased below WT levels by a previously described series of 
hypomorphic spo11 alleles (spo11HA, spo11 YFHA, spol1DAHA; ref. 24). DSB levels 
were increased above WT levels using a tel14 mutation, alone and in combination 
with a spoll hypomorph (tel1A spo11 HA). The average numbers of Zip3 foci per 
bivalent in the different strains were then plotted as a function of beam-film pre- 
cursor or DSB level (discussion below). Such analysis was in strain backgrounds that 
were also either (1) WT for crossover interference (TOP2) or (2) carried the pCLB2- 
TOP2 construct that resulted in meiotic depletion of topoisomerase II (see text). 

The number of DSBs per bivalent ina TOP2 strain with WT DSB formation can 
be accurately determined on the basis of comprehensive evaluation results from 
DSB mapping (for example, ref. 12), microarray (for example, ref. 45) and classic 
genetic measurements (http://www.yeastgenome.org). The numbers of DSBs on 
chromosomes III, IV and XV are thus defined as 6, 19 and 13 respectively. The 
relative levels of DSBs in strains carrying spo11 mutations has been evaluated in a 
TOP2 background by gel electrophoresis in a rad50S background” (where DSBs 
do not turn over). In the tel14 mutant, DSBs are increased by ~50% at the 
HIS4LEU2 locus in a rad50S background without significantly altering crossover 
interference*® (Extended Data Fig. 7 and L.Z., unpublished observations). 

However, in some regions and circumstances, rad50S DSB levels are known to 
be lower than the level of DSBs in RAD50 meiosis (see, for example, refs 11, 12). 
Furthermore, rad50S analysis of spo11/tel1A alleles in a pCLB2-TOP2 background 
has not been performed. We therefore also evaluated DSB levels by application of 
beam-film analysis. For all strains analysed for Zip3 focus patterns, both TOP2 
and pCLB2-TOP2, best-fit beam-film simulations were defined* (Figs 2-4 and 
Extended Data Figs 2 and 4). For each strain, all parameter values were held 
constant at those defined for the two SPO11 TELI1 cases (see text) except that 
the average number of precursors per bivalent (N) was varied to determine the 
value that gave the optimal match between observed and predicted crossover 
patterns for that strain. Beam-film-predicted DSB/precursor levels were the same 
for the TOP2 and pCLB2-TOP2 versions of all strains (Figs 2-4 and Extended 
Data Fig. 4c). This prediction matches the experimental finding that TOP2 and 
pCLB2-TOP2 strains exhibit the same level of total inter-homologue events 
(crossover plus non-crossover) at HIS4LEU2 in a RAD50 SPO11 TEL1 back- 
ground (Extended Data Fig. 8). Furthermore, for TOP2 strains, DSB/precursor 
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values obtained by beam-film simulations are very similar to those obtained on the 
basis of rad508S analysis (Extended Data Fig. 4c). Correspondingly, crossover home- 
ostasis relationships are very similar regardless of whether DSBs or beam-film- 
predicted precursors are used as the metric (Fig. 2d and Extended Data Fig. 4d). 

Interestingly, experimentally determined rad50S DSB levels tend to be slightly 
lower than those predicted by beam-film analysis, especially at lower DSB levels 
(Extended Data Fig. 4). Moreover, experimental data match beam-film-predicted 
crossover homeostasis relationships somewhat more accurately when the metric 
of the DSB level is the beam-film-predicted precursor level, especially at lower 
DSB/precursor levels (Extended Data Fig. 4d). This correspondence suggests that 
beam-film-predicted values may be more accurate than rad50S experimental 
values. Data in ref. 24 support this conclusion: at HIS4LEU2, a spol11HA/HA 
strain exhibits 50% of the SPO11 level of rad50S DSBs but 62% of the level of 
inter-homologue recombination products (crossover plus non-crossover), imply- 
ing a deficit of 20% by rad50S analysis. Similarly, a spo11HA/DA strain exhibits 
20% of the SPO11 level of rad50S DSBs but 27% the level of inter-homologue 
recombination products, a deficit of 26%. 

These analyses also provide further evidence (in addition to that presented in 
Extended Data Fig. 7) that the increased number of Zip3 foci seen in top2 mutants 
compared with TOP2 strains cannot be explained as increased DSBs. 
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Extended Data Figure 1 | Top2 protein level and localization on ab113687). b, Immunostaining of Top2 on meiotic chromosomes with the 
chromosomes in three top2 mutants. a, Top2 protein levels shown as a same antibody used for western blot analysis in a: at pachytene (shown) and at 
function of time after entry into meiosis (t = 0). Top2 levels are severely leptotene (data not shown). Top2 is undetectable on chromosomes in pCLB2- 


reduced in pCLB2-TOP2 (middle panel) and are the same as WT in pCLB2- TOP2 and is present at similar levels to WT in pCLB2-TOP2 top2YF and 
TOP2 top2 YF (+20% relative to anti-Pgk1 control). Western blot analysis used — top2SNM. Chromosomes were concomitantly immunostained for Zip1 (Santa 
anti-Top2 antibody (TopoGEN 2014) and anti-Pgk1 antibody (Abcam Cruz, sc-48716) as in text Fig. 1. Scale bars, 3 um. 
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Extended Data Figure 2 | Decreased crossover interference in pCLB2-TOP2 mutants on chromosome XV (Figs 2 and 3) are also observed on chromosomes 
and sir2/4, slx5/ is confirmed on other chromosomes. a, b, The same IV and III in pCLB2-TOP2 and sir24, and on chromosome IV in slx5/. Data for 


decreases in crossover interference (Leoc ~ 0.2 um versus ~ 0.3 um in WT) WT in black. 
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Extended Data Figure 3 | Decreased crossover interference as revealed by strains exhibiting WT interference (average of averages is 3.25 + 0.06 jim) and 
modified coefficient of coincidence and tetrad analysis using the method of __ strains defective in the top2 interference pathway (average of averages is 

ref. 21, but synaptonemal complex length is the same as in WT. a, By 3.27 + 0.07 um). b, Decreased crossover interference in slx5A and sir2RK as 
modified coefficient of coincidence analysis (Fig. 1; Methods), crossover revealed by tetrad analysis. Each pair of intervals was tested, reciprocally, for the 
interference can extend to about two intervals on either side of the reference _ ratio of the map distances in one interval with and without crossovers in the 
interval (Lycoc ~ 0.3 um) in WT and in three sir2 mutants that exhibit WT other interval. Each number shows the average of the ratios for the two 
crossover patterning by other criteria (Lcoc ~ 0.3 um; Fig. 3 and Extended Data _ reciprocal cases. A value less than 1 indicates crossover interference. Solid and 
Fig. 2). In contrast, in all examined single and double mutants where crossover _ dotted lines indicate whether the level of interference is statistically (P < 0.05 by 


interference is defective (Lcoc ~ 0.2 um; Figs 2-4), crossover interference G-test) significant or not, respectively. Genetic crossover interference is greatly 
extends only about 1.3 intervals (Lyscoc ~ 0.2 um) (for top2 mutants, see also —_ decreased in slx5J, and sir2RK relative to WT on each of three chromosomes. 
Fig. 2). Right column shows synaptonemal complex lengths for each of the Tetrad data upon which this analysis is based are given in Supplementary 
analysed strains (average + s.d.). There is no significant difference between Table 2. 
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Extended Data Figure 4 | Additional aspects of crossover homeostasis 
analysis. a, b, Crossover patterns along chromosome XV in TOP2 strains 

(a, black) and pCLB2-TOP2 strains (b, black) with WT or altered DSB levels as 
conferred by the indicated spol 1/tell genotypes (for crossover homeostasis 
analysis; Fig. 2d and Methods). All experimental data sets were also subjected to 
beam-film simulation analysis (a and b, red). In all cases (a and b, red), best-fit 
simulations were obtained by using the same parameters as those that give the 
best-fit for SPO11 TEL1 meiosis (ref. 8; Fig. 2a) except that number of 
precursors (given by parameter N) was altered to account for alterations in DSB 
levels in the different strain backgrounds (Lpp = 0.3 pm in TOP2 background 
versus 0.2 um in PCLB2-TOP2 background; see Methods and below). For each 
spol1/tell genotype, the best-fit value of (N) is the same in pCLB2-TOP2 as in 
TOP2, thus confirming that the only change in various pCLB2-TOP2 strains 
examined is a change in precursor number, with no change in interference. The 
same results are also seen for beam-film simulations of analogous data for 
chromosome III (not shown). These results further illustrate the accuracy with 
which beam-film simulations can describe diverse crossover patterns. 

c, Comparison of rad50S DSB levels and beam-film-predicted precursor levels 
(N) for chromosome XV among strains with varying DSB levels due to different 
SPO11 TELI or carrying spol and/or tell mutant alleles. Top line: number of 
DSBs genome-wide, relative to WT = 100, as defined by rad508S analysis in 
TOP2 strains, either SPO11 TELI or carrying spoll and/or tell mutant alleles 
(details in Methods). Middle line: number of DSBs predicted for chromosome 
XV. Number of DSBs in TOP2 SPO11 TELI was defined by several approaches 
(details in Methods). DSBs per chromosome XV as predicted for spo11/tel1 
mutant strains by comparison of rad50S DSB levels with SPO11 TEI1 (top line). 
Bottom line: number of precursors predicted to be present by beam-film best-fit 
simulation analysis (given by parameter N, above). Predicted values are the 
same for TOP2 and pCLB-TOP2 strain series (from simulations in a and b). 


ARTICLE 


Note that in strains with lower total DSB levels, rad50S analysis gives lower 
DSB/precursor levels than beam-film simulations (discussion in Methods). 
Analogous results are obtained for chromosome III, as follows. (1) The 
predicted values of N are the same for both TOP2 and pCLB2-TOP2 strain 
series: N= 9 for tel14, 6 for TEL1 SPO11, 5 for spo11-HA/spo11HA and 3 for 
spoll-HA/spo11YF. (2) These predicted values of N correspond well to DSB 
values predicted from rad50S analysis except at the lowest DSB levels: predicted 
DSBs = 9 for tel1A, 6 for TEL1 SPO11, 5 for spol1-HA/spo11HA and 2 for 
spol1-HA/spo11YF. d, Experimentally determined numbers of Zip3 foci from 
the analyses of chromosome XV ina and b are plotted as a function of either the 
number of precursors predicted by beam-film simulation analysis (left) or the 
number of DSBs predicted by rad50S DSB analysis (right) (values from c). 

e, Same as d, except that we analysed chromosome III. A slightly better match of 
experimental data to beam-film simulation predictions is obtained when the x 
axis metric is the predicted precursor number than when it is rad50S predicted 
DSB levels, suggesting that beam-film simulations are more accurate than 
rad50S DSB analysis, which is known to underestimate DSBs in several 
situations. Note that for each strain and chromosome, Zip3 foci were analysed 
in 200-300 cells. The average numbers of foci per bivalent + s.d. as presented in 
d and e were as follows. TOP2 chromosome XV (d): tell1A 5.21 + 0.93; tellA 
spollHA 4.92 + 1.12; TEL1 SPO11 4.67 = 1.16; spol11HA/spoll1HA 

4.11 + 0.97; spol1HA/spo1DA 4.07 + 1.07; spol11HA/spo11YF 3.51 = 0.88. 
pCLB2-TOP2 chromosome XV (d): tell4 6.46 + 1.13; TELI SPO11 5.96 + 1.1; 
spol1HA/spo11HA 5.29 + 0.99; spol 1HA/spol1DA 4.76 + 0.94; spol 1HA/ 
spo11YF 3.71 + 0.98. TOP2 chromosome III (e): tel14 2.16 + 0.59; TEL1 
SPOII1 1.82 + 0.55; spol 1HA/spol1HA 1.7 + 0.62; spol11HA/spollYF 

1.31 + 0.66. pCLB2-TOP2 chromosome III (e): tel14 2.49 + 0.82; TEL1 SPO11 
2.1 + 0.87; spol 1HA/spol1HA 2.07 = 0.75; spol1HA/spol11YF 1.51 + 0.69. 
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Extended Data Figure 5 | Increased level of SUMO-protein conjugates in 
slx5A. a, Western blots for whole protein extracts in WT and s/x54 probed 
with anti-Smt3 antibody (Santa Cruz, sc-28649) and anti-Pgk1 antibody 
(Abcam ab113687) as a function of time after entry into meiosis (t = 0). 
Abundance of SUMO conjugates is increased in the mutant, especially in 
regions of high molecular mass. b, Quantification of the gel in a. 
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is defective in histone deacetylase activity”; sir24C500 lacks a Sir2 cohesion 
role™. sir3A, sir4A, esc2A and esc8A eliminate Sir2 interaction partners involved 
in silencing*®; hst1A eliminates a Sir2 homologue”. 


Extended Data Figure 6 | The role of Sir2 in crossover interference is 
specific to its interaction with Slx5. WT crossover interference is seen in 
diverse sir2 non-null mutants affecting specific sub-functions (other than 
sir2RK; Fig. 3) and in mutants deleted for various interaction partners. sir2-345 
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Extended Data Figure 7 | Mutant coefficient of coincidence and crossover 
number phenotypes cannot be explained by increased DSBs or by 
prolongation of the crossover-designation stage. Mutants in the described 
crossover interference pathway all confer coordinate changes in crossover 
interference, which is reduced, and the total number of crossovers, which is 
increased, by about 20% on chromosome XV. There are the expected 
consequences of a single defect in crossover interference, as illustrated by 
corresponding beam-film simulations, which quantitatively explain these 
results by a change in a single parameter, the interference length (Lp) (Figs 2 
and 3). This interference defect could comprise a defect in generation and 
spreading of the inhibitory signal and/or of the ability of unreacted precursors 
to respond to that signal (see text and Methods (section ‘Beam-film 
simulations’)). An increase in the number of crossovers can also occur as the 
result of either (1) prolongation of the crossover-designation period or (2) an 
increase in the number of DSBs*. Neither of these effects can explain the mutant 
phenotypes described in the text. (1) Crossover designation precedes 
synaptonemal complex formation and thus the pachytene stage'*. Time-course 
analysis of representative mutant strains reveals that, in sir2 mutants and in 
top2SNM, meiosis proceeds through pachytene and the two meiotic divisions 
normally (Extended Data Fig. 8a; ref. 14; data not shown). s/x5/8 mutants and 
PCLB2-TOP2 mutants show no delay in progressing through prophase to 
pachytene (data not shown) but show a delay in meiosis I (s/x5) or pachytene 
arrest (PCLB2-TOP2) (Extended Data Fig. 8a; data not shown). The pCLB2- 
TOP2 top2YF mutant does show a delay in achieving pachytene, as well as 
pachytene arrest, but exhibits the same crossover patterning phenotype as all 
other mutants, which show no pre-pachytene delay. Thus, prolonged crossover 
designation is not the basis for these phenotypes. (2) An increase in DSBs, 
without any change in crossover interference, does increase the number of 
crossovers; however, it has very little effect on crossover interference 
relationships (coefficient of coincidence curves) in budding yeast’. 
Correspondingly, two lines of evidence show that the mutant defects described 
here cannot be attributed to an increase in DSBs. a, A tel14 mutant exhibits 


Zip3 foci /Bivalent 


increased DSBs but no change in coefficient of coincidence relationships. TEL1 
encodes the yeast homologue of ATM. Absence of Tell confers a 50% increase 
in DSBs® and a 10% increase in number of Zip3 foci (Supplementary Fig. 7 in 
ref. 8; reproduced in Extended Data Fig. 7a left, red colour). However, (1) there 
is no change in coefficient of coincidence relationships relative to WT 
(Extended Data Fig. 7a left), (2) the increase in crossovers is precisely that 
predicted on the basis of crossover homeostasis (ref. 8; text Fig. 2d, filled black 
circle at 19 DSBs/precursors per chromosome XV) and (3) beam-film 
simulation accurately describes the tell1d phenotype, relative to WT, by a 
change in a single parameter: the level of DSBs (n = 19, grey, versus 13, gold, in 
WT). The last point is documented in Extended Data Fig. 7a middle and right. 
The middle panel in Extended Data Fig. 7a shows the beam-film best-fit 
simulation for WT chromosome XV, where n = 13 (gold), compared with the 
experimental coefficient of coincidence curve (black; from Fig. 1); the right 
panel shows the beam-film best-fit simulation for tel14 chromosome XV, 
where n = 19 (grey) and all other parameters are the same as for WT, compared 
with the experimental coefficient of coincidence curve (red) from the left panel. 
b, Beam-film simulations predict no/little change in coefficient of coincidence 
with increasing DSBs for yeast chromosome XV (data not shown). More 
specifically, to explain the increased number of crossovers observed in the 
analysed mutants, for example pCLB2-TOP2, the value of N required for beam- 
film simulations of chromosome XV would be 26 (double the WT value of 
N = 13). If beam-film simulations are performed under the same parameter 
values used for WT except that N = 26 instead of N = 13, the predicted 
coefficient of coincidence curve is unchanged compared with that predicted 
for WT (left panel, compare gold for N = 13 with green for N = 26). 
Correspondingly, the coefficient of coincidence curve predicted for N = 26 
(green) matches the WT coefficient of coincidence curve (black) and is unlike 
the coefficient of coincidence curve for the mutant (pink) (right panel). 
Additional evidence that DSB number is not altered in pCLB2-TOP2 versus 
TOP2 is presented in Extended Data Figs 4 and 8. 
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Extended Data Figure 8 | Progression of meiosis and of recombination in 
interference-defective mutants. Representative mutants were examined for 
progression of meiotic divisions and for recombination at the previously 
characterized HIS4LEU2 locus” (strains in Extended Data Table 1). a, Meiotic 
divisions. The first meiotic division occurs normally in sir2RK (defective in 
interaction with Slx5); it is delayed in slx5A and is completely absent in PCLB2- 
TOP2 and PCLB-TOP2 top2YF due to arrest at pachytene”’ (L.Z., unpublished 
observations). b, c, DNA events. The HIS4LEU2 locus probably provides a 
direct readout of DNA events independent of the effects of interference. 
HIS4LEU2 does not exhibit crossover homeostasis”, which implies that it is not 
sensitive to crossover interference®. This feature presumably reflects the fact 
that this locus is a very strong DSB hot spot. A DSB occurs at this site in virtually 
every nucleus with a concomitant reduction in DSBs (and thus crossover 
precursors) at other positions in its vicinity (N.K., unpublished observations). 
This locus may also undergo early crossover designation, thus also dominating 
crossover interference patterns per se. Importantly, Zip3 foci are used for 
diagnosis of crossover interference relationships*. Zip3 foci form as a specific 
consequence of programmed crossover designation; they do not mark the sites 
of non-interfering crossovers, which exhibit an entirely different pattern along 
the chromosomes’. Furthermore, formation of Zip3 foci is upstream of, and 
thus insensitive to, defects in later events, including (1) major perturbations in 
the kinetics of recombination or the fidelity with which initiated events 
(crossover-fated and/or non-crossover-fated) proceed to their assigned fates 
(see, for example, ref. 14) or (2) the potential occurrence of additional DSBs due 
to delayed synaptonemal complex formation (discussion in refs 8 and 56). 


Thus, none of the recombination aberrancies detected by physical analysis of 
recombination in the analysed mutants (below) is relevant to their crossover 
interference phenotypes. Correspondingly, although all mutants give exactly 
the same crossover patterns (interference and crossover number) as defined by 
Zip3 foci, the mutants vary widely with respect to DNA recombination 
phenotypes. The results below can be summarized to say that (1) absence of 
Slx5/8-Sir2 STUDL activity has little, or only subtle, effect(s) on recombination, 
whereas (2) absence of TopolI or Topoll catalytic activity confers delays and 
aberrancies. b, DSBs, SEIs and dHJs. Progression through recombination is 
very similar to WT in sir2RK and slx54. Both PCLB2-TOP2 and PCLB-TOP2 
top2YF exhibit a phenotype corresponding to delayed progression beyond the 
point of crossover designation: DSBs appear on time; however, DSBs, single- 
end invasions (SEIs) and double Holliday junctions (dHJs) all accumulate to 
higher than normal levels at later than normal times, implying delayed 
progression of crossover-designated DSBs to SEIs, and of SEIs to dHJs, where 
SEIs and dHJs are both crossover-specific intermediates'*. There is no 
significant alteration in homologue-versus-sister bias in any of the four 
mutants, with inter-homologue dHJs predominating over inter-sister dHJs 
similarly to WT in all cases. c, Inter-homologue crossover (CO) and non- 
crossover (NCO) products. Inter-homologue crossover and non-crossover 
levels are very similar to WT in PCLB2-TOP2 and show variations relative to 
WT in the other mutants. A differential deficit of crossovers versus non- 
crossovers in PCLB2-TOP2 top2YF suggests a specific defect in crossover 
maturation in this mutant. 
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Extended Data Figure 9 | The metric of crossover interference is physical 
axis length (micrometres). a, This study considered two different condensin 
mutants, ycs4S and pCLB2-BRN1. Axis length is normal in ycs4S and longer 
than normal in pCLB2-BRNI. Analysis presented for chromosome XV in 
pCLB2-BRNI (Fig. 5) was also done on chromosome III in that mutant 
background (right column), confirming that coefficient of coincidence 
relationships are WT when the metric is physical chromosome length but not 
when the metric is genomic distance. We similarly analysed chromosomes III 
and XV in the ycs4S background (left and middle columns), confirming WT 
coefficient of coincidence relationships by both metrics. b, Zip3 focus analysis 
for chromosome XV in the indicated strains (red; from Fig. 5) and beam-film 
simulation analysis (green). Best-fit simulations could be obtained for all strains 
using the same parameter values as for WT meiosis, including interference 
distance (Lr ~ 0.3 um), except that the number of precursors (N) had to be 


varied linearly with axis length. For the indicted strains, from left to right, 
N= 17, 13, 12, 10, 9 and 8. This result implies direct interplay between physical 
chromosome length (micrometres of synaptonemal complex) and DSB 
probability, as discussed elsewhere. c, d, For the mutant cases described in 

b, experimentally observed average numbers of Zip3 foci vary linearly with axis 
length (c). In contrast, different numbers of Zip3 foci are observed for the 
different strains despite the fact that chromosome XV has the same genomic 
length in all cases (d). We also note that the best fit simulation for BR zip1A had 
to include a 10% decrease in the ‘efficiency of maturation of crossover- 
designated interactions’, which, in the present context, implies that in a zip14 
background there is a 10% reduction in either (1) the stability of a Zip3 focus 
under cytological spreading conditions at the absence of synaptonemal 
complex or (2) the probability that a crossover designation will give a Zip3 
focus. 
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Extended Data Table 1 | Strains used in this study 


Strains 
NKY4146 
NKY4147 
NKY4148 
LZY1842 
LZY1570 
LZY1845 
LZY2306 
LZY2190 
LZY2237 
LZY2207 
LZY2262 
LZY2194 
LZY2187 
LZY2266 
LZY2054 
LZY2418 
LZY1983 
LZY2325 
LZY2319 
LZY1572 
LZY1667 
LZY2166 
LZY2012 
LZY1756 
LZY1702 
LZY1516 
LZY1723 
LZY2146 
LZY1718 
LZY1451 
LZY1201 
LZY1446 
LZY1986 
LZY932 
LZY2006 
LZY1163 
LZY1325 
LZY1261 
LZY1364 
LZY1471 
LZY1488 
LZY1472 
LZY773 
LZY1317 
LZY1386 
LZY1318 
LZY1504 
LZY2018 
LZY2080 
LZY2313 
LZY2430 
LZY2341 
LZY446 
LZY447 
LZY1614 
LZY1617 
LZY2413 
LZY2414 
LZY2261 
LZY2255 
LZY2198 
LZY2199 


Genotype 


HMR: :LacO-URA3/, URA3::CYC1p-Lacl-GFP/, ZIP3-13myc.:Hygromycin 


URA3::C YC 1p-Lacl-GFP/", scp1(Ch XV telomere)::LacO-LEU2/, ZIP3-13myc::Hygromycin 


leu2::Lacl-GFP::Clonat/", tel4::226xLacO::Kan/, ZIP3-13myc.:Hygromycin 
as NKY4146, except pCLB2-TOP2:KanMx/” 

as NKY4147, except pCLB2-TOP2:KanMX/" 

as NKY4148, except pCLB2-TOP2:KanMX/ 

as NKY4147, except top2-SNM::KanMX/” 

as NKY4147, except pCLB2-TOP2:KanMX!", top2(Y782F):URA3 

as NKY4147, except ubc9-GFP::KanMX/" 

as NKY4147, except red7::kanMX6/", LEU2::pYl-red1KR 

as NKY4147, except pCLB2-TOP2:KanMX/’, tel1D::KanMXx/" 

as NKY4147, except pCLB2-TOP2:KanMX/", spo11-HA3His6::KanMX4/" 


as NKY4147, 
as NKY4147, 


as NKY4147, except s/x5D::natMx/" 

as NKY4148, except s/x5D::natMx/" 

as NKY4147, except as s/x8D::natMX/" 

as NKY4147, except s/x5D::nat7::s/x5-sim(1-4) ::KanMx/ 

as NKY4147, except s/x8-SS::natMx7" 

as NKY4147, except sir2D:KanMx/ 

as NKY4146, except sir2D:KanMx/ 

as NKY4148, except sir2D:KanMx/” 

as NKY4147, except sir2D::KanMX4::Sir2-R139K::natMx/” 

as NKY4147, except sir2-345:: natMx/" 

as NKY4147, except sir2-DC500::KanMX/sir2-DC 500: :natNT2 

as NKY4147, except sir3D::LEU2/" 

as NKY4147, except sir4D::KanMX/sir4: :natNT2 

as NKY4147, except esc2D::KanMXx/" 

as NKY4147, except esc8D::KanMX/" 

as NKY4147, except hst?1D.:KanMx/" 

as NKY4147, except naj1D::KanMx/" 

as NKY4147, except hta1-S128A/", hta2-S128A/" 

as NKY4147, except pCLB2-NSE2::KanMx/" 

as NKY4147, except dot1D::KanMx/" 

as NKY4147, except smc6-9::NAT/" 

as NKY4147, except ndt80D::LEU2/", REC8-3HA::URA3/+, pCLB2BRN1::KANMX4/" 
as NKY4146, except nat80D::LEU2/", REC8-3HA::URA3/+, pCLB2BRN1::KANMX4/" 
as NKY4147, except ndt80D::KanMX/", REC8-3HA::URA3/+, ycs4S/" 

as NKY4146, except nat80D::KanMX/", REC8-3HA::URA3/+, ycs4S/" 

as NKY4146, except pch2D::KanMx/" 

as NKY4148, except poh2D::KanMxX/" 

as NKY4147, except pch2D::KanMx/" 

as NKY4147, except cdc6::kanMX6::PSCC1:3-HA-CDC6”, ndt80::LEU2/" 
as NKY4147, except mlh1D::KanMx/” 

as NKY4147, except mlh3D::KanMx/’ 

as NKY4147, except mms4D::KanMX/ 

as NKY4147, except msh2::LEU2/’ 

as NKY4147, except sir2D::KanMX4::Sir2-R139K::nat/", pCLB2-TOP2.:KanMx/" 
as NKY4147, except sir2D::KanMX4::Sir2-R139K::nat/", sIx5D::natMx/" 

as NKY4147, except s/x5D::natMX/", red1::kanMx6/", LEU2::pYl-red1KR, 
as NKY4147, except s/x5D::natMX/", top2-SNM::KanMx/" 

as NKY4147, except top2-SNM::KanMX, red1::KanMxX, LEU2-red1KR 
ho::hisG leu2 ura3 nuc1::hygroB HIS4::LEU2-(BamHl+tori), MAT alpha 
ho::hisG leu2 ura3 nuc1::hygroB his4-x::LEU2-(NgoMIV+ori)--URA3, MAT a 


as LZY446, 
as LZY447, 
as LZY446, 
as LZY447, 
as LZY446, 
as LZY447, 
as LZY447, 
as LZY447, 


except pCLB2-TOP2::KanMxX 

except pCLB2-TOP2::KanMx 

except pCLB2-TOP2::KanMX, URA3.:top2(Y782F) 
except pCLB2-TOP2::KanMX, URA3:.top2(Y782F) 
except sixSD.:natMx 

except six5D.:natMx 

except sir2D::KanMX4::Sir2-R139K: :nat 

except sir2D::KanMX4.:Sir2-R139K::nat 


All strains are isogenic derivatives of SK1 with ho::hisG, leu2 and ura3. 
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except pCLB2-TOP2:KanMX/", spo11-HA3His6::KanMX4/spo 1 1(D290A)-HA3His6: :KAnNMX4 
except pCLB2-TOP2:KanMX/", spo11-HA3His6::KanMX4/spo 1 1-(Y 135F)-HA3His6::KanMX 
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Structure of class C GPCR metabotropic 
glutamate receptor 5 transmembrane 


domain 


Andrew S. Doré!*, Krzysztof Okrasa!*, Jayesh C. Patel’*, Maria Serrano-Vega'*, Kirstie Bennett!, Robert M. Cooke’, 
James C. Errey', AliJ azayeri’, Samir Khan', Ben Tehan!, Malcolm Weir", Giselle R. Wiggin! & Fiona H. Marshall! 


Metabotropic glutamate receptors are class C G-protein-coupled receptors which respond to the neurotransmitter glu- 
tamate. Structural studies have been restricted to the amino-terminal extracellular domain, providing little under- 
standing of the membrane-spanning signal transduction domain. Metabotropic glutamate receptor 5 is of considerable 
interest as a drug target in the treatment of fragile X syndrome, autism, depression, anxiety, addiction and movement 
disorders. Here we report the crystal structure of the transmembrane domain of the human receptor in complex with the 
negative allosteric modulator, mavoglurant. The structure provides detailed insight into the architecture of the trans- 
membrane domain of class C receptors including the precise location of the allosteric binding site within the transmem- 
brane domain and key micro-switches which regulate receptor signalling. This structure also provides a model for all class 
C G-protein-coupled receptors and may aid in the design of new small-molecule drugs for the treatment of brain disorders. 


Glutamate is the major excitatory neurotransmitter in the central ner- 
vous system. It mediates its activity through ionotropic channels and 
eight metabotropic G-protein-coupled receptors (GPCRs) which are 
expressed in neuronal and glial cells. The metabotropic glutamate (mGlu) 
receptors can be divided into three groups on the basis of their sequence 
similarity, pharmacology and transduction mechanisms. mGlus and 
mGlu, are class I mGlu receptors which are primarily located post- 
synaptically and couple to the Gg/1; pathway’. mGlus is abundant through- 
out the cortex, hippocampus, striatum, caudate nucleus and nucleus 
accumbens’, areas involved in emotion, motivation and cognition. mGlus 
is a promising therapeutic target and negative allosteric modulators 
(NAMs) which reduce mGlu; receptor activation are undergoing clin- 
ical trials for the treatment of fragile X syndrome, depression, anxiety, 


Extracellular 
Mavoglurant , @ 


Intracellular 


~ C-term 


oF. 


migraine, and dyskinesias**, whereas positive allosteric modulators (PAMs) 
which increase receptor activation may be beneficial in schizophrenia 
and cognitive disorders’. 

GPCRs can be divided into four major classes (A, B, C and F) on the 
basis of sequence similarity®. Structures of over 20 class A receptors”®, 
one class F’, and two class B receptors’®"' have been solved to date. Class 
C GPCRs include mGlu receptors, y-aminobutyric acid B-type recep- 
tors (GABAg), calcium-sensing receptors (CaS), taste receptors (TAS) 
and several orphan receptors’”*. mGlu receptors have an unusual struc- 
ture comprising a large extracellular domain consisting of the ‘venus fly 
trap’ (VFT), which binds glutamate and a cysteine-rich domain (CRD), 
linked to the seven-transmembrane domain (TMD). A characteristic of 
class C GPCRs is that they exist as dimers, which in the case of the mGlu 


Figure 1 | Schematic and ribbon 
representation of the mGlu; 
structure. a, b, Ribbon 
representation of mGlus in rainbow 
colouration (N terminus, blue; 

C terminus, red) viewed parallel to 
the membrane and from the 
extracellular space, respectively. 
Mavoglurant is represented in 
translucent stick representation. 
Carbon, nitrogen and oxygen atoms 
are coloured magenta, blue and red, 
respectively. The position of the 
T4L-insertion to ICL2 is indicated. 


T™M1 


Mavoglurant 


1Heptares Therapeutics Ltd, BioPark, Broadwater Road, Welwyn Garden City, Hertfordshire AL7 3AX, UK. 


*These authors contributed equally to this work. 
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family is mediated by interactions between the VFT and TMDs. Crys- 
tal structures of several VFT domains have been solved’*"*, however, 
structural information on the TMD has until recently remained elu- 
sive. This domain is of particular importance as drug discovery efforts 
have focused on the identification of allosteric modulators which bind 
within this region and allow greater subtype selectivity'’*'*'®. To advance 
our understanding of the mode of action of class C GPCRs and to enable 
structure-based drug design of allosteric modulators, we have deter- 
mined the crystal structure of the TMD of the human mGlu; receptor 
in complex with the NAM mavoglurant (AFQ056)*, which is currently 
in phase III clinical trials for fragile X syndrome. 


Structure determination 


To determine the TMD structure of mGlus, a thermostabilized recep- 
tor (StaR) was generated as previously described'”"’. The receptor was 
thermostabilized in the presence of the allosteric radioligand 2-methyl- 
6-((3-methoxyphenyl)ethynyl)-pyridine ( [>H]-M-MPEP)” (Extended 
Data Fig. 1a); and contains six mutations, none of which are located in 
the allosteric binding site (Extended Data Fig. 1b). 

To facilitate crystallization, the flexible domains were removed from 
the N terminus (residues 2-568), and C terminus (residues 837-1153, 
including 2 residues from the predicted helix 8), and T4-lysozyme (T4L) 
inserted into intracellular loop (ICL) 2 between Lys 678 and Lys 679 
(Fig. 1a). The affinity of [>H]-M-MPEP for this construct (mGlu;StaR 
(569-836)-T4L) does not differ from the wild-type receptor (Kg = 0.86 
+ 0.04nM and 1.05 + 0.15 nM, respectively; Extended Data Table 1) 
and there is no difference in the affinities of a range of allosteric modu- 
lators for the wild-type and StaR (Extended Data Table 1 and Extended 
Data Fig. 2). Modifications of the protein required for crystallization 
preclude G protein coupling. In the absence of the glutamate-binding 
domain, mavoglurant cannot act as an allosteric modulator. 

The structure of mGlu; bound to methyl (3aR,4S,7aR)-4-hydroxy- 
4-[(3-methylphenyl)ethynyl]octahydro-1H-indole-1-carboxylate (mavo- 
glurant) was determined to 2.6 A through merging diffraction data from 
5 crystals grown in lipidic cubic phase (LCP). The structure was solved 
by molecular replacement with one copy of the receptor in the asym- 
metric unit (Extended Data Figs 3 and 4). Details of data collection and 
refinement are in Extended Data Table 2. 


Overall architecture of mGlu; 


The TMD of mGlu; is comprised of seven-transmembrane helices (TM1- 
7) (Fig. 1a, 1b). Residues are assigned numbers (in superscript) based on a 
modification of the Ballesteros-Weinstein system” previously suggested’ 
for class C (Extended Data Table 3). Continuous density is observed 
for the intracellular loops ICL1 (forming a short «-helix), ICL3 and for 
the extracellular loops (ECL) 1 and 3, which lack secondary structure. 
Cys 644°? at the N-terminal end of TM3 forms a conserved disulphide 
bond with Cys 733 in ECL2 (Fig. 1a). ECL2 spans the top of the recep- 
tor interacting with the N terminus of TM1 (Tyr 730 to Asp 577), the C 
terminus of TM2 (Val729 to Ala 6377>* and Leu 731 to Leu635""), 
ECL1 and TM3 (Leu 731 to Gln 647°*”). The disulphide bond between 
the top of TM3 and ECL2 is critical in anchoring ECL2, however, the 
interaction with Gln 647°” also appears conserved with glutamine or 
arginine at the equivalent position in 92% of class C receptors (percen- 
tage conservation values used refer to vertebrate class C receptors as 
shown in Extended Data Table 3). 

The configuration of the helical bundle and the position of ECL2 com- 
bine to severely restrict the entrance to the allosteric pocket (Fig. 2). This 
is consistent with native ligands binding to the extracellular domain, 
and as such there is no requirement for a wide entrance to the binding 
pocket in the transmembrane domain. At the C terminus of the receptor, 
an outward kink at the end of TM7 is propagated by Pro 820’°° and 
Lys 8217*'; these two residues are highly conserved (> 90%) across class 
C receptors, suggesting that the helix kink may be a common structural 
feature in class C. 
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Figure 2 | The narrow entrance to the allosteric pocket in mGlu;. Surface 
representation of the mGlu; TMD from two orientations separated by 90°. 

a, View parallel to the membrane plane. b, View from the extracellular space. 
The surface of ECL2 is coloured yellow, and the rest of the receptor green. 
Mavoglurant is represented in a space-filling representation, with carbon, 
nitrogen and oxygen atoms coloured magenta, blue and red, respectively. The 
allosteric pocket exhibits a narrow opening (ellipsoid denotes theoretical 
entrance dimensions at widest point in a) as a result of TM helix conformations 
and a network of interactions anchoring ECL2 across the top of the receptor. 


Comparison with class A and B GPCRs 

The superfamily of GPCRs is presumed to have evolved from a com- 
mon ancestral receptor as multiple sequence motifs are shared across 
two or more subfamilies**. With the mGlu, structure in hand, we could 
explore a structural alignment between class A, B and C subfamilies. 
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Comparisons with mGlu; have been performed using rhodopsin (PDB 
ID: 1F88), which shows the closest alignment from class A”’, and the 
CRF,R structure (PDB ID: 4K5Y)”° from class B, with all structures in 
the inactive state. 

Global (all atom) superposition of the mGlu, structure with rho- 
dopsin and CRFjR, and local superposition of TM helices (Extended 
Data Table 4) reveal that the consensus across TM positions appears 
best across the intracellular halves of the receptors (Fig. 3a—f), consis- 
tent with the structural constraints of G protein coupling, and with the 
highest levels of structural diversity observed across the extracellular 
portions. TM1 is structurally well aligned across all three receptor classes, 
differing most at the N termini. In class A and B receptors, similar sets 
of interactions between TM1 and TM7 influence the trajectory of the 
extracellular portion of TM7. In mGlus this is closest to the TM7 of 
rhodopsin, yet shifted by 5 A towards the centre of the helical bundle, 
and towards TM2 contributing to the narrow entrance to the allosteric 
pocket (Fig. 3c). 

The extracellular trajectory of TM2 in mGlus more closely resem- 
bles that of CRF,R (Fig. 3f), appearing straighter and without the bend 
towards TM] observed in several class A structures. However, the extra- 
cellular half of TM2 in mGlu;s is shifted closer to the central axis of the 
receptor in comparison to CRF; R by ~2.5 A due to interaction with the 


a Extracellular b 


Intracellular 


d Extracellular e 


Intracellular 


Figure 3 | Superposition and comparison of mGlu; with rhodopsin and 
CRE,R. The superimposed structures of mGlus, CRF,R (PDB ID: 4K5Y) and 
rhodopsin (PDB ID: 1F88) are represented as green, blue and yellow ribbons, 
respectively. a-c, mGlu; superposed with rhodopsin is viewed parallel to the 
membrane plane (a), from the intracellular side (b) and extracellular side (c). 
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highly conserved Tyr 629*°° (89% tyrosine or phenylalanine) packing 
against Gly 590’? (90% conserved), and towards TM7. The centre of 
TM2 in mGlus is closer to the central axis, which constricts the extra- 
cellular side and moves the intracellular end of TM2 further out from 
the helical bundle. 

The most striking difference between mGlus, rhodopsin and CRF,R 
is the position of TM5. Though the extracellular portion of TM5 in mGlu; 
grossly follows that of CRF,R, the entire helix is positioned further 
inwards in comparison to both rhodopsin and CRF,R (Fig. 3b, c, e, f) 
by approximately 6 A which further contributes to the narrow entrance 
to the allosteric cavity. 


Interactions in conserved motifs 

The TMD of mGlus is considered to function in a similar way to class 
A receptors”. The isolated monomeric TMD of mGlu, couples to G 
proteins in response toa PAM”. GPCRs exhibit a set of conserved micro- 
domains that are important in signal transduction**”’. One highly con- 
served motif is the E(D)RY sequence in TM3, termed the ‘ionic lock’, 
which bridges TM3 and TME6. In rhodopsin, a salt bridge is formed 
between Arg 135°’ and Glu 247°” and is considered a hallmark of the 
inactive state. Similarly, in the mGlus structure Lys 665°°° in TM3 forms 
a salt bridge with Glu770°*? (2.7 A distance) in TM6, and interacts with 


Extracellular side 


TM3 


Extracellular side 


. T™1 


d-f, mGlus superposed with CRF,R is viewed parallel to the membrane plane 
(d), from the intracellular side (e) or from the extracellular side (f). For both 
superpositions, rotations indicated for panels b, e and ¢, f are relative to 

a, d, respectively. Red arrows denote large differences in transmembrane 
positions between class C, class A and class B receptors. 
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Figure 4 | The ‘ionic-lock’ motifs in 
mGlu; and rhodopsin. a, b, mGlus 
shown as green ribbons (a) and 
rhodopsin as yellow ribbons (b). 
Interacting residues are shown as 
sticks with carbon, nitrogen and 
oxygen atoms coloured grey, blue 
and red, respectively. Specific 
interactions depicted as dashed red 
lines (distances labelled). 

c, Constitutive activity of mGlus 
mutants compared to wild type 
(WT). IP1 response is expressed as 
percentage of wild type, after 
normalization to number of 
receptors per cell for each construct 
(see Extended Data Fig. 5). Error bars 
indicate standard error of mean, 

P values derived from an unpaired 
two-tail t-test. Data are representative 
of three independent experiments. 
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Ser 613 (2.4 A distance) in ICL1 (Fig. 4a, b). Additionally, the less well 
conserved Arg 668°”? is within hydrogen bonding distance of Ser 614 
in ICL] (3.1 A distance), thus making a secondary lock further tether- 
ing TM6 to TM3 via ICL1 (Fig. 4a). Two of the stabilizing mutations 
flank Arg 668°? (N667Y and I1669A). However, Tyr 667 faces outwards 
and is unlikely to influence this network of interactions. 

Consistent with a role in maintaining an inactive conformation, dis- 
ruption of the ionic lock through mutation of Ser 613, Glu770°*° or 
Lys 665°” to alanine significantly increases constitutive activity com- 
pared to wild-type (Fig. 4c). Mutating Ser 613 to lysine also results in 
higher levels of constitutive activity caused by charge repulsion with 
Lys 665°*°. Importantly, substituting Ser 614 with aspartic acid to create 
a stronger ionic interaction with Arg 668°~° than the naturally occurring 
hydrogen bond, results in boosting this secondary lock and a significant 
decrease in constitutive activity (Fig. 4c). Taken together, these results 
provide functional evidence for the role of this network in modulating 
receptor activity. 

Lys 665°°°, Glu 770°”? and Ser 613 are highly conserved (>77%) across 
class C GPCRs including mGlu, GABAg, CaS and TAS] receptors. In 
both GABAg subunits, the glutamate is substituted with aspartate. In 
the TAS1 taste receptors these residues vary in that in TAS1R3 the glu- 
tamate and arginine in TM6 are swapped in position and the lysine is 
replaced with asparagine. In TASIR1/TAS1R2, Lys 665°” is replaced 
with arginine and glutamine is found at the equivalent position to Arg 
668°°*. Mutations in one or more of these residues have been found to 
alter the signalling of GABAg,, the signalling subunit of the GABA, 
receptor and the CaS receptor’. This region is also important in human 
disease as the mutation E781K°*° in mGlug leads to congenital night 
blindness and altered G protein coupling”’. Collectively, these data high- 
light the functional importance of this network within class C receptors. 

Using the mGlus structure, it is possible to rationalize the role of other 
residues highly conserved across class C receptors. Leu 6227? (92% con- 
served) in TM2 points inwards towards a conserved aromatic cage formed 
by residues in TM1, TM2 and TM7. Leu 750°” (100% conserved) and 
Tyr 746°*° (83% conserved) in TM5 appear structurally important in 
stabilizing TM3, with Tyr 746°*° additionally bridging TM5 to TM4. 
The hydroxyl group of Tyr 746”*° interacts with the backbone carbonyl 
of Leu 700*”? in TM4, breaking the helical hydrogen bonding pattern 
and influencing the trajectory of the extracellular portion of TM4. 

The conserved NP’°xxY (x)s,6F motif, found at the junction of TM7 
with helix 8 in class A receptors, undergoes substantial rearrangements 
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during activation*"”. The conserved tyrosine (Tyr 306”°? in rhodopsin) 
faces inwards, moving to fill the space created by the outward movement 
of TM6 upon activation. In mGlu; the conserved motif F/Y/HxPKxY 
found at the intracellular end of TM7 appears to play an analogous role 
but ina different way (Fig. 5a, b). First, Tyr 823”? faces out towards the 
membrane, C-terminal from the kink at the intracellular end of TM7, 
and ~8.3 A away from Tyr 306”*° in rhodopsin (measured between Cor 
positions). It is therefore unlikely that Tyr 823”°? in mGlus performs a 
similar role to that of the conserved tyrosine in the NPxxY(x)F motif 
during class A activation. Instead, the highly conserved Lys 821”! and/or 
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Figure 5 | Molecular details of further conserved sequence motifs in mGlu; 
and comparison to rhodopsin. a-d, Residues in the FxPKxY and NPxxY 
motifs (a, b) and residues in the FxxCWxP motif (c, d), in mGlu; and 
rhodopsin, respectively. mGlu; and rhodopsin colour schemes as in Fig. 4. 
Interacting residues are shown as sticks with carbon, nitrogen and oxygen 
atoms coloured grey, blue and red, respectively. Specific interactions are 
depicted as dashed red lines with distances labelled. 
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Phe 818’** in mGlus superpose across the position of Tyr 306”? in 


rhodopsin with both residues capable of stabilizing the inter-helical 
space created by the outward movement of TM6. 

Finally, the highly conserved FxxCWxP°*? motifin TM6 of rhodop- 
sin contains the tryptophan at position 6.48 known as the toggle switch, 
which alters rotamer states upon activation. A kink induced in TM6 by 
Pro 267°°° in rhodopsin, along with rotamer changes of the tryptophan, 
form the basis for the outward movement of the intracellular end of 
TM6 upon activation. Class C receptors also have a highly conserved 
(83%) tryptophan in the equivalent position in TM6. However, in the 
mGlus structure, TM6 has moved laterally away from TM7 compared 
to rhodopsin by 3.3 A. A potential bifurcated hydrogen bond from the 
main-chain carbonyl of Leu744°* to Gly 748°** and the indole ring 
of Trp 785°°” then bridges TM6 to TM5, with the observed rotation of 
the tryptophan side chain proving critical for the observed binding posi- 
tion of mavoglurant (Fig. 5c, d). 


Allosteric modulator binding site 


Strong density for mavoglurant (Fig. 6b) is found ina pocket ~8 A from 
the receptor surface and spanning ligand positions observed previously 
for class A and B (Fig. 6c). The ligand is oriented with an approximate 
30° tilt (relative to the axis of the TM bundle) in a pocket defined by 
residues from TM2, TM3, TM5, TM6 and T'M7 (Fig. 6a). 

The 3-methylphenyl ring of mavoglurant sits in a pocket between 
Ala 810’“° and Pro 655°, bordered on one side by Ie 6257*°, Gly 628°”, 
Ser 654°*?, Ser 658° and on the other by Tyr 659°. The alkyne linker 
traverses a narrow channel between Tyr 659°, Ser 809”°”, Val 806”°° 
and Pro 655°*°, with the hydroxyl substituent at the other end sitting 
almost equidistant between, and within hydrogen bonding distance of, 
the hydroxyl oxygens from Ser 809”? and Ser 805”*° (Fig. 6a, b). The 
shape and interactions in this region of the binding pocket partly explains 
the predominance of an alkyne linker in mGlus allosteric modulators. 
The saturated bicyclic ring system sits within a mainly hydrophobic 
pocket defined by Val 8067°°, Met 80277, Phe 788°, Trp 785°>", Leu 
744° Tle 651°°°, Pro 655°" and Asn 747°”. The Asn 747°” 6-nitrogen 
hydrogen bonds to the carbonyl oxygens of Gly 652°” and the carbamate 
tail of mavoglurant, with the methoxy terminal portion of the carbamate 
sitting in a cavity defined by Leu 744°*, Pro 743°” and Ile 651°*°. The 
importance of many of these residues in allosteric ligand binding to mGlus, 
has been suggested previously in extensive mutagenesis studies”. 
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Figure 6 | The mGlu; allosteric modulator binding site. a, Diagram ofligand 
interactions in the binding pocket. The colour scheme of the boxes follows the 
rainbow colouration as in Fig. 1b. Hydrogen bonds depicted as dashed red 
lines with distances between heavy atoms in angstroms. b, View from the 
membrane of specific interactions in the binding pocket. Extracellular portions 
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In the mGlus structure the 3-methyl substituent of mavoglurant is 
within 4 A of a network of hydrogen bonds involving the side chains 
of Tyr 659°**, Thr 781°“°, the main-chain carbonyl of Ser 809”*? and 
a water moleciile at the bottotn of the allosteric pocket (Fig. 6b). Vary- 
ing the 3-methyl substituent on the phenyl ring in related compounds 
from methoxy to chloro to fluoro switches the ligand from a NAM toa 
neutral binder to a PAM, respectively”’. Furthermore, mutation of Thr 
781° (Thr 780,a1) and Ser 809” *° (Ser 808,:) to alanine also switches 
the pharmacology of alkyne type PAMs**. Although the side chain of 
Thr 781°*° is directly involved in the hydrogen bond network described 
above, the side chain of Ser 809”*? makes a hydrogen bond to the car- 
bonyl of Ser 805’ in addition to mavoglurant. This induces a small ae 
in TM7 (ref. 38) and orients the main-chain carbonyl of Ser 8097? t 
engage with the water (Fig. 6b). The observed changes in ligand cei 
cology resulting from perturbation of the network around the water, 
point to a potential activation switch holding TM7, TM3 and TM6 
together at the base of the pocket, and a structural basis for modulation 
of the conformational spectra of the receptor. 

While this manuscript was under review, the structure of the trans- 
membrane domain of the related class C receptor mGlu, in complex with 
a different class of NAM 4-fluoro-N-(4-(6-(isopropylamino)pyrimidin- 
4-yl)thiazol-2-yl)-N-methylbenzamide (FITM) was published”. The 
two structures are in close agreement, with a number of the molecular 
features described here conserved across subtypes. However, in com- 
parison to the position of mavoglurant in mGlus, FITM is found higher 
in the mGlu, allosteric site, a position more analogous to that of com- 
pounds bound in the orthosteric site of class A receptors, highlighting 
the potential for different allosteric modulator binding modes across this 
class of receptors. 


Conclusions 


The structure of the TMD of the mGlu; receptor provides a detailed 
view of the core domain of a class C GPCR, complementing the extra- 
cellular domain structures. Alongside structures for class A, B and F 
receptors, this completes initial structural coverage of the classes of 
GPCRs, enabling conserved and divergent structural and mechanistic 
features to be identified. The mGlus receptor structure also increases 
our understanding of the mechanism of action of allosteric modulators 
for metabotropic receptors and will enable the design of both negative 
and positive allosteric modulators by providing a template for homology 
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of TM5/TM6 removed for clarity. Fp-F. OMIT density contoured at 2.0 0 
calculated before mavoglurant inclusion in the model, mavoglurant shown as 
sticks as in Fig. la. c, The location of mavoglurant (magenta) in comparison 
to an antagonist (blue) bound to CRF,R, and selection of ligands (yellow) 
bound to class A receptors. 


31 JULY 2014 | VOL 511 | NATURE | 561 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


modelling of related receptors. Drugs directed at this class of receptors 
show great promise in the treatment of severe neuropsychiatric disorders. 


METHODS SUMMARY 


Conformationally thermostabilized human mGlus lacking the N-terminal extra- 
cellular domain and with T4-lysozyme fused into intracellular loop 2 was expressed 
in baculovirus-infected insect cells and purified in n-dodecyl-f-b-maltopyranoside 
(DDM) using a histidine-tag. Crystals were grown in LCP at 20.0 °C using a mono- 
olein/cholesterol mixture. Diffraction data from 5 crystals, collected at Diamond 
Light Source beamline 124, was used to solve the structure by molecular replacement 
using the structures of T4-lysozyme and an ensemble of 8 GPCR TMD structures as 
initial search models. The structure was refined to 2.6 A with good statistics (Rwork/ 
Réree = 23.9/27.4%). For further details of experimental procedures see Methods. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


StaR generation. Full-length human mGlu; was used as background for the gen- 
eration of the conformationally thermostabilized receptor using a mutagenesis 
approach described earlier’”*°. Mutants were analysed for thermostability in the 
presence of the radioligand [(?H]-M-MPEP (Extended Data Fig. 1a). The mGlus 
StaR contains 6 thermostabilizing mutations (see Extended Data Fig. 1b). 

Cell culture. HEK293T cells were cultured in DMEM supplemented with 10% (v/v) 
fetal bovine serum (FBS). Cells were transfected using GeneJuice (Merck Millipore) 
according to manufacturer’s instructions and collected after 48 h. 
Thermostability measurement. Transiently transfected HEK293T cells expres- 
sing wild-type receptors were solubilized in 50 mM HEPES pH 7.5, 150 mM NaCl 
assay buffer containing 1% (w/v) n-dodecyl-f-p-maltopyranoside (DDM) and 
EDTA-free cOmplete Protease Inhibitor (Roche) tablets for 1 h rotating at 4 °C. 
Crude lysates were cleared by centrifugation at 16,000g for 15 min and solubilized 
receptors purified on Ni-NTA beads. Immobilized receptors were washed in assay 
buffer containing 0.025% DDM, 20 mM imidazole before elution using assay buffer 
with 0.025% DDM, 100 mM histidine. Receptor thermostability was measured by 
incubating with 50 nM [*H]-M-MPEP for 1 hat 4 °C followed by 30 minat varying 
temperatures. Unbound radioligand was separated by gel filtration and levels of ligand- 
bound receptor were determined using a liquid scintillation counter. Thermal sta- 
bility (apparent T,,,) was defined as the temperature at which 50% ligand binding 
is retained. 

Truncation and T4-lysozyme fusion constructs. To generate the minimal con- 
struct for crystallization, the mGlus StaR construct was truncated at the N terminus 
up to residue proline 569. In addition, the C terminus of the receptor was truncated 
at alanine 836. In order to identify the best T4 lysozyme fusion construct, a matrix 
was designed between Arg 671 and Gln 693 with 67 T4L fusion constructs gener- 
ated. Following transient expression in HEK293T cells, the fusions were analysed 
and ranked using thermal stability assay as well as fSEC analysis*’. Based on these 
data, the best T4L position was found to be between Lys 678 and Lys 679. This final 
construct is referred to as StaR(569-836)-T4L. 

Expression in insect cells. StaR(569-836)-T4L carrying an N-terminal GP64 
signal sequence and a C-terminal deca histidine-tag was expressed in Sf21 cells 
grown in ESF921 medium supplemented with 10% (v/v) FBS and 1% (v/v) penicillin/ 
streptomycin using the FastBac expression system (Invitrogen). Cells were infected 
at a density of 2 X 10° cells per ml with baculovirus at an approximate multiplicity 
of infection of 1. Cultures were grown at 27 °C and collected 48 h post-infection. 
Membrane preparation and protein purification. All subsequent purification 
steps were carried out at 4°C. To prepare membranes, two litres of cells were 
resuspended in PBS buffer supplemented with protease inhibitor tablets and 
5 mM EDTA. Cells were disrupted by micro-fluidizer at 14 kPSI and membranes 
collected by ultracentrifugation at 204,700g for 1 h. Membranes were washed with 
PBS buffer supplemented with protease inhibitor tablets and 500 mM NaCl, col- 
lected by ultracentrifugation and resuspended in 40 mM HEPES pH 7.5, 250 mM 
NaCl and stored at —80 °C. Just before solubilization membranes were thawed, 
homogenized, supplemented with 40 1M mavoglurant (GVK Bioscience) and 8 mM 
iodoacetamide, and incubated on a roller mixer for 40 min. Membranes were sol- 
ubilized with 1.5% (w/v) DDM for one hour, insoluble material was removed by 
ultra-centrifugation and the solubilized lysate batch bound to 10 ml of Ni-NTA 
Superflow resin (Qiagen) for three hours in the presence of 10 mM imidazole. Resin 
was washed with a gradient of 10 to 50 mM imidazole in 40 mM HEPES pH7.5, 
250 mM NaCl, 0.05% (w/v) DDM, and 20 LM mavoglurant over 35 column volumes 
before bound material was eluted in a step with 245 mM imidazole. Receptor was 
further purified by gel filtration (SEC) in 40 mM HEPES pH7.5, 150mM NaCl, 
0.03% (w/v) DDM, and 40 1M mavoglurant. Receptor purity was analysed using 
SDS-PAGE and LC-MS, and receptor monodispersity was assayed by analytical 
SEC. Protein concentration was determined using the receptor’s calculated extinc- 
tion coefficient at 280 nm (E280, calc = 58,730 (mg per ml X cm) !) and confirmed 
by quantitative amino acid analysis. 

Radioligand binding. HEK293T cells were transiently transfected (as described 
above) to express either mGlus or mGlu;StaR(569-836)-T4L. Membranes were 
prepared as previously described”. For saturation binding experiments, HEK293T 
membranes transiently expressing mGlu; or mGlu,StaR(569-836)-T4L (2.5 jig per 
well) were incubated with varying concentrations of [7H]-M-MPEP (final assay 
concentration 0-30 nM) in the presence or absence of 0.1mM MPEP to define 
non-specific binding (assay buffer 50 mM HEPES (pH 7.4), 150mM NaCl). For 
competition binding experiments, the concentration of [*H]-M-MPEP was fixed 
to ~1 nM and varying concentrations of cold compound (fenobam, dipraglurant, 
MPEP, mavoglurant; 0.3 nM tol10 1M) were added to the reaction mixture. Bind- 
ing assays were incubated for 1h at 25°C the reaction was terminated by rapid 
filtration through 96-well GEF/B filter plates pre-soaked with 0.1% polyethylenei- 
mine (PEI) using a 96-well head harvester (Tomtec, USA) and plates washed with 
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5 X 0.5 ml water. Plates were dried, and bound radioactivity was measured using 
scintillation spectroscopy on a Microbeta counter (PerkinElmer, UK). 

Data were analysed using GraphPad Prism v5 (San Diego, USA). Saturation bind- 
ing data was globally fitted to one site total and non-specific binding. Inhibition 
curves were fitted to a four-parameter logistic equation to determine IC; values, 
which were converted to K; values using Kg values determined by saturation bind- 
ing and the [?H]-M-MPEP concentration of ~1 nM. 

Constitutive activity. HEK293 cells were transiently transfected to express wild- 
type mGlu; or mutant constructs using 1 jig of plasmid DNA per 5 X 10° cells. Approx- 
imately 4 h post-transfection, cells were detached from plate using enzyme free cell 
dissociation buffer (Life technologies, USA) and total cell count was determined. 
Cells were seeded into poly-L-lysine treated 96-well plate at the density of 7 x 10* 
cells per well and incubated overnight at the 37 °C incubator. Next day, cells were 
incubated with HBSS buffer supplemented with 5 mM sodium pyruvate and 2 ul 
per ml glutamate pyruvate transaminase for 30 min in the 37 °C incubator to break- 
down endogenous glutamate. Following 1h incubation with the IP1 stimulation 
buffer, the levels of constitutive IP1 were determined using IP-one HTRE assay kit 
(Cisbio, France) according to the manufacturer’s guidelines. In each experiment, 
sixteen replicates were analysed for each construct. In parallel, cells from the same 
set of transfections were used to carry out whole cell saturation binding using [*H]- 
M-MPEP essentially as described above in order to determine receptor numbers 
under full equilibrium. Levels of IP1 response was then normalized to number of 
receptors to account for the differences in cell numbers. 

Crystallization. mGlu;StaR(569-836)-T4L was crystallized in LCP at 20°C. The 
protein was concentrated to ~25 mg ml ' and mixed with monoolein (Nu-Check) 
supplemented with 10% (w/w) cholesterol (Sigma Aldrich) and 50 1M mavoglur- 
ant using the twin-syringe method”. The final protein:lipid ratio was 40:60 (w/w). 
50 nl boli were dispensed on 96-well glass bases and overlaid with 750 nl precip- 
itant solution using a Mosquito LCP from TTPLabtech. 40 um plate-shaped crys- 
tals of mGlusStaR(569-836)-T4L were grown in 100 mM 2-ethanesulphonic acid 
(MES) across a pH range of 5.5-6.8, 100-250 mM (NH4)2HPO,, 24-34% (v/v) 
polyethylene glycol 400, and 50 1M mavoglurant. A complete data set to 2.6 Awas 
obtained by merging diffraction data from 5 crystals belonging to the monoclinic 
spacegroup C121. It was possible to mount single crystals for data collection, which 
were flash-frozen and stored in liquid nitrogen without the addition of further 
cryoprotectant. 

Diffraction data collection and processing. X-ray diffraction data were mea- 
sured on a Pilatus 6M detector at Diamond Light Source beamline 124 using a beam 
size of 10 X 10 um. Crystals displayed slightly anisotropic diffraction, initially out 
to 2.3 A following exposure to a 70% attenuated beam for 0.5 s per degree of oscilla- 
tion. It was possible to collect ~30° of useful data from each crystal before radiation 
damage became severe. Data from 5 individual crystals were integrated using XDS* 
and a complete data set compiled using the data collection strategy option of the 
program Mosflm“. Data merging and scaling was carried out using the program 
AIMLESS”. Data collection statistics are reported in Extended Data Table 2. 
Structure solution and refinement. The structure was solved by molecular replace- 
ment using the program Phaser**“° using two independent search models. First T4L 
from the adenosine A,, receptor structure (PDB ID: 3EML) was located by Phaser, 
this was then ‘fixed’ and subsequent searches performed using an in-house library 
of truncated versions of receptor TMD structures present in the RCSB in attempt- 
ing to locate the mGlu; TMD. However, searches with single receptor structures 
did not yield a solution for the mGlu; TMD that could be validated through sub- 
sequent refinement. Finally, a superposed ensemble of eight receptor structures, 
truncated to a specific and highly sensitive core TM helix region, were required by 
Phaser to solve the structure and provide an ‘averaged’ solution model that could 
be validated in refinement. Initial refinement was carried out with REFMAC5“*“”. 
Manual model building was performed in COOT* using sigma-A weighted 2F,- 
F., F.-F, maps in concert with simulated-annealing and simple composite omit 
maps calculated in PHENIX” using autobuild. The later stages of refinement were 
performed using PHENIX implementing a combination of simulated annealing, 
TLS, positional and individual isotropic B-factor refinement. The structure was refined 
to 2.6 A with Ryord/ Ree = 23.9/27.4%. Structure quality was assessed with MolProbity”. 
Refinement statistics are presented in Extended Data Table 2. 

Structure analysis. Structures were superposed and aligned for comparison pur- 
poses using the program COOT* to generate global structural superpositions. Local 
r.m.s.d. analysis between mGlus, class B and selected class A GPCRs (Extended 
Data Table 5) was performed using Maestro v. 9.3 (Schrédinger). Figures were pre- 
pared using PyMOL (Schrédinger). 
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Extended Data Figure 1 | Comparison of wild-type and thermostabilized independent experiments. b, mGlus crystallization construct (StaR(569-836)- 
mGlus. a, The thermal stability of mGlu5 constructs measured using PH] TAL) in schematic representation. Thermostabilizing mutations (green) are: 
M-MPEP binding following DDM solubilization. Wild-type full-length mGlu,; E579A, N667Y, 1669A, G675M, T742A, S753A. Residues forming the allosteric 
(closed circles) has a T,, of 20.6 + 1.6 °C and mGlu;StaR 569-836-T4L (open _ pocket are pink. Disordered residues in the structure are grey. The disulphide 
circles) has a T,, of 27.2 + 0.3 °C. The inset shows the wild-type full-length bond between Cys644°”°-Cys733 is denoted by a dashed yellow line. 

mGlu; data on a different scale. Data represent the mean + s.d. from 3 
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Extended Data Figure 2 | Pharmacology of mGlu,StaR(569-836)-T4L. 
Experiments performed in membranes from HEK293T cells transiently 
expressing mGlu;StaR(569-836)-T4L. a, Saturation binding of [(7H]-M-MPEP 
to mGlu;StaR(569-836)-T4L. Non-specific binding was determined by 
addition of 0.1 mM MPEP. The data shown (mean + s.e.m.) is representative of 
three independent experiments performed in duplicate. Data were fitted 
globally to a one-site saturation isotherm yielding a Kg of 0.86 + 0.04nM and 
Bmax Of 54.7 + 9.5 pmol mg '. b, Competition binding. Membranes were 


incubated with varying concentrations (3 nM-10 UM) of the mGlu; NAMs 
fenobam, dipraglurant, MPEP and mavoglurant. Inhibition curves were fitted 
to a four-parameter logistic equation to determine ICs 9 values, which were 
converted to Kj values using Kq values determined by saturation binding and 
the [°H]-M-MPEP concentration of ~1 nM. The pK; values obtained are 
shown in Extended Data Table 2. Data shown are the mean + s.e.m. of three 
independent experiments. 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Extended Data Figure 3 | Crystal packing in the mGlu,StaR(569-836)-T4L 
monoclinic C121 system. a, View of the crystal lattice in the bc and ac planes 
respectively. The single copy of the mGlu;StaR(569-836)-T4L fusion present 
in the asymmetric unit is shown as Co-trace (magenta). Symmetry mates 
shown as Ca-trace with the receptor TMD coloured green and T4L orange. 
mGlus receptor TMDs stack along the c axis in layers mediated by TAL. 

b, mGlu, receptor TMDs pack through an interface mediated by antiparallel 
TM4-TM4 interactions with the T4L domain swung out towards one side of 


the receptor. c, The extracellular loops of the mGlu; TMD are not constrained 
by packing interactions. Receptor and T4L coloured as in a. Two specific 
interactions are observed between the mGlu;-TMD N-terminal helical 
extension and T4L moiety from a symmetry mate. d, The intracellular loops of 
the mGlus TMD are not constrained by packing interactions, receptor and 
TAL coloured as in a. e, f, Result of 10ns MD simulation on the mGlu;sTMD, 
models shown are separated by 2 ns of simulation. Cx r.m.s.d. (between starting 
and final model) = 2.0 A. 
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Extended Data Figure 4 | T4L mediated contacts in the mGlu;StaR(569- N-terminal lobes of T4L appear to pivot at V1071 to accommodate packing 
836)-T4L monoclinic C121 system. T4L layers are held together by extensive _ interactions in the mGlu;StaR(569-836)-T4L monoclinic C121 system, 
contacts between the T4L moieties. Specifically the N-terminal lobes of T4L forming an intramolecular B-sheet between residues Y1018-K1019 and 
(residues 1002-1071) are shifted by up to 3.7 A (distance between equivalentCa Y1018’-K1019’ of a symmetry mate, with additional hydrogen bonds between 
positions of E1022) in comparison to the input structure to molecular the back bone of G1012-L1013 and G1012'-L1013’. Hydrogen bonds denoted 
replacement and refinement (residues 1002-1071 from PDB ID: 3EML). The _ by dashed red lines, distances measured in angstroms. 
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Extended Data Figure 5 | Constitutive activity of mGlu; mutants. IP1 sample. b, Numbers of expressed receptors per cell were determined using 
response and number of receptors per cells in HEK293 cells transiently ('H]-M-MPEP binding. Error bars indicate standard error of mean and 
transfected to express wild-type (WT) or the indicated mGlu, mutants. P values are derived from an unpaired two-tailed t-test. Data are representative 
a, Constitutive activity of wild-type and mutant constructs expressed as of three independent experiments. 


percentage of background IP1 levels detected in mock transfected control 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Extended Data Table 1 | Comparison of the pharmacology of mGlus and mGlus StaR(569-836)-T4L 
a 


Construct K, (nM) + S.E.M. Bax (pmol/mg) + S.E.M. 


mGlu, wild-type full-length 1.05 +0.15 6.1 +0.6 


mGlu, StaR(569-836)-T4L 0.86 + 0.04 54.7 +9.5** 


(**p<0.01). 


mGlu, wild-type full-length mGlu, StaR(569-836)-T4L 
Compound 
pK, + S.E.M. pK, + S.E.M. 
dipraglurant 7.49 + 0.30 7.24 + 0.23 


MPEP 8.46 + 0.07 8.10 +0.12 


Binding analysis was performed in HEK293 membranes transiently transfected to express either mGlus or mGlusStaR(569-836)-TA4L. a, Following saturation analysis the affinity (Ky) and expression levels (Bmax) 
obtained were compared using an unpaired two-tailed t-test. Data from three independent experiments performed in duplicate. b, Competition binding analysis was carried out to compare the affinity of a range of 
ligands. Inhibition curves were fitted to a four-parameter logistic equation to determine ICso values, which were converted to K; values using Ka values determined by saturation binding and the [°H]-M-MPEP 
concentration of ~1 nM. The pK; values obtained for each compound binding to either mGlus or mGlusStaR(569-836)-T4L were compared using an unpaired two-tailed t-test. For each compound there was no 
significant change in affinity at mGlusStaR(569-836)-T4L compared to mGlus. Data shown are pK; = s.e.m. for three independent experiments. 
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Extended Data Table 2 | Data collection and refinement statistics for 
mGlu,StaR(569-836)-T4L 


Data collection 


Number of crystals 5 
Space group C121 
Cell dimensions 
a, b, c (A) 143.2, 43.6, 82.0 
a, B, y () 90.0, 99.4, 90.0 
Number of reflections measured 39243 
Number of unique reflections 14800 
Resolution (A) 34.51 — 2.60 (2.72 — 2.60) 
Rmnerge 0.116 (0.754) 
Mean I/sd(I) 7.4 (1.8) 
Completeness (%) 94.9 (94.8) 
Redundancy 2.7 (2.6) 
Refinement 
Resolution (A) 29.74 — 2.60 
Number of reflections (test set) 14093 (691) 
Rwork/Reree 0.2392 / 0.2747 
Number of atoms 
All 3355 
Proteins 3212 
Ligand 23 
Others (Lipids, ions, waters) 120 
Average B factors (A) 
All 41.49 
mGlus 39.25 
T4L lysozyme 44.22 
Ligand 34.58 
Others (Lipid, ion, water) 49.22 
RMSD 
Bond lengths (A) 0.003 
Bond angles (°) 0.727 
Ramachandran statistics 
Favored regions (%) 98.0 
Allowed regions (%) 2.0 
Outliers (%) 0.0 


MolProbity overall score (percentile) 


1.17 (100" percentile) 


Values in parenthesis indicate highest resolution shell. 
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taste receptors; retinoic acid-induced G1-G4; GPC6A. The pink highlighted column present in each TM alignment denotes the X.50 Ballesteros-Weinstein! residue defined in ref. 39 on the basis of the alignment 


of mGlu; and CXCR4. Note X.50 numbering of both systems is in agreement across TM5. 


; GPR156/158/179; 


preceptors (GABAg); 


; y-aminobutyric acid 


ited. All rights reserved 


calcium sensing receptor (CaS); 
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8 metabotropic glutamate receptors; 
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Numbers at top of alignment refer to residue numbers from mGlus. Numbers in superscript also across the top of each TM alignment are given to amino acid residues and based on a modification of the 
Ballesteros-Weinstein numbering system suggested for class C in ref. 1 with X.50 residues denoted by black columns. TM alignments cover core 15 residues of each TM helix as observed in the mGlus crystal 


structure. Sequences from top to bottom of each alignment include; 
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Extended Data Table 4 | Structural differences between mGlus, class B and selected class A GPCRs 


RMSD 
Receptor PDB ID All 
™ T™1 TM2 TM3 TM4 TM5 TM6 TM7 
s 
CRF, 4K5Y 3.6 2.0(0.5) 2.3(2.0) 2.9(1.6) 3.8(1.6) 2.4(0.6) 3.1(1.3) 6.9(2.5) 
Rhodopsin 1F88 3.3 2.0(10.9) 3.7(2.9) 2.6(1.5) 3.5(1.0) 4.7(1.9) 2.5(1.5) 3.9(2.9) 


B, adrenergic 2VT4—s«3.2)-2.5(0.8)  3.4(3.0) 2.2(1.5) 3.21.3) 3.6(1.9) 2.5(1.5) 4.5(3.0) 
By adrenergic 2RH1 3.1 2.4(0.7) 3.5(3.0) 2.2(1.5) 3.0(1.0) 3.5(2.0) 2.6(1.6) 4.4(3.1) 
Bo adrenergic  3SN6 3.5 2.8(1.0) 3.4(3.0) 2.8(1.4) 2.6(1.2) 3.8(1.9) 5.1(2.4) 3.6(2.3) 
Adenosine Asx,  2YDV 3.2 2.0(0.4) 3.4(3.0) 3.6(1.6) 2.9(0.9) 3.7(2.0) 2.8(1.6) 3.7(2.1) 
Adenosine Ax,  3PWH 3.1 1.5(0.9) 3.2(3.0) 2.8(1.8) 3.2(1.3) 3.9(21) 2.4(1.5) 4.4(2.9) 

CXCR4 3ODU) =-3.4.—-2.3(1.4) 3.1(2.7)  2.4(1.5) 3.5(2.3) 4.4(2.0) 3.1(1.5) 4.8(3.0) 


DopamineD,  3PBL 3.1 1.7(0.7) 3.2(2.9) 2.6(1.4) 2.9(1.0) 3.7(1.9) 2.5(1.4) 4.5(2.8) 


Histamine H, 3RZE =—-3.1—-1.9(0.7) 3.3(29) 2.2(1.5) 2.9(1.3) 4.2(1.8) 2.6(1.7) 4.0(2.8) 
Muscarinic M) 3UON 3.3 2.1(0.8) 3.5(2.9) 1.9(1.4) 3.5(1.2) 4.5(1.6) 2.9(1.4) 4.1(2.6) 
Muscarinic M,  4DAJ 3.1 1.8(0.6) 3.6(2.9) 2.1(1.6) 3.2(1.1) 4.3(1.7) 3.0(2.1) 3.6(2.6) 
SIP, 3V2Y 2.9 -2.4(0.6)  2.5(1.9) 2.7(1.9) 3.2(1.0) 3.0(0.7) 2.1(1.4) 4.1(3.0) 
opioid 4DJH 3.4 3.6(1.1) 2.9(2.3) 2.7(1.8) 3.5(1.5) 4.5(2.1) 2.7(1.4) 3.9(2.3) 

u opioid 4DKL 3.4 2.3(0.9) 2.8(2.3) 2.7(1.8) 3.1(1.4) 4.8(2.0) 2.8(1.5) 4.7(2.9) 

8 opioid 4EJ4 «3.4. 2.5(0.8) 3.0(2.6)  2.5(1.7) 3.4(1.3) 4.6(2.0) 2.7(1.6) 4.7(2.9) 
ORL-1 4EA3 =—s-3.2—-2.4(1.0)  3.0(2.5)  2.4(1.6) 3.2(1.3) 4.6(1.9) 2.6(1.5) 4.5(2.9) 
PAR-1 38VW7 33.5 3.01.5) 3.2(2.5)  3.5(1.8) 2.8(1.1) 4.6(0.9) 3.5(2.0) 3.7(3.0) 
NTSR, 4GRV = 3.3_—3.4(0.9) 3.1(2.7)  2.9(1.3) 3.0(1.1) 3.7(2.0) 3.6(1.4) 3.3(2.0) 


Backbone r.m.s.d. values were calculated after a global superposition using a core TM region shared by class Aand C GPCRs and mGlus as defined by mGlus residues 579-603, 613-638, 642-674, 692-712, 740- 
761, 770-790 and 797-819, corresponding to the class A Ballesteros-Weinstein residues 1.35-1.59, 2.38-2.63, 3.23-3.55, 4.41-4.61, 5.40-5.61, 6.33-6.53, and 7.33-7.55. Backbone r.m.s.d. values in brackets 
were calculated after local superposition of the individual TM helices. Chain C was used for CRF;R and chain A for the class A GPCRs. 
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Extended Data Table 5 | Alignment and comparison of allosteric binding site residues across the 8 human metabotropic glutamate receptors 


ssa Fs Bs al Ft Ea al nal cs 


2.45 
2.46 
2.49 
3.36 
3.37 
3.39 
3.40 
3.43 
3.44 
5.43 
5.44 
5.47 
eal 
6.46 
6.49 
6.50 
6.53 
7.32 
7.35 
7.36 
7.39 
7.40 
7.43 


All sequences are from Homo sapiens apart from column 11 which is from Drosophila melanogaster. Letters in columns 3-11 are single letter amino acid code. Columns are from left: (1) PIN number based on a 
modification of the Ballesteros-Weinstein numbering system suggested for class C in ref. 1; (2) mGlus residue number; (3) mGlus residue, single letter code; (4) mGlu; equivalent residue, single letter code; 
(5) mGluz equivalent residue, single letter code; (6) mGlug equivalent residue, single letter code; (7) mGlu4 equivalent residue, single letter code; (8) mGlug equivalent residue, single letter code; (9) mGlu7z 
equivalent residue, single letter code; (10) mGlug equivalent residue, single letter code; (11) mGlu* equivalent residue, single letter code; (12) overall residue conservation at equivalent positions; (13) overall 
residue quality score at equivalent positions**. The quality score is inversely proportional to the average cost of all pairs of mutations observed in a particular column of the alignment, a high alignment quality score 


625 
628 
651 
652 
654 
655 
658 
659 
743 
744 
747 
751 
781 
784 
785 
788 
802 
805 
806 
809 
810 
813 


rPrucunvez=imnseir-e-AaA-2Z2rewyve<HnyVNHA—-A- DA 


rence Pome AHK-ZzrevysaKcnnungn<ca-o 


<Qu<unZ2mnf2-—-H—-2rFHnHVe2 0 7 FP Ara<a 
<Ounvc<wvnvEFmMSeK HK OrFHvnKz NO WRHA Fr UHN< wo 
“urunucunrenmeceantBEurreaxvnfgoaqgraoa-a 
MoPrunrnrense-AHEUNrF OK HVA HDAArF-- 
rPrunetzaurenmnstcanteseunvrux vnfBaqagarana-a 
u»PruNZuUrFmMeSeK— aH ZH KrF Hv zn Zaaran-a 


for a column would suggest that there are no mutations, or most mutations observed are favourable; (14) overall alignment consensus. 
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Velocity anti-correlation of diametrically opposed 
galaxy satellites in the low-redshift Universe 


Neil G. Ibata', Rodrigo A. Ibata’, Benoit Famaey” & Geraint F. Lewis* 


Recent work has shown that the Milky Way and the Andromeda gal- 
axies both possess the unexpected property that their dwarf satellite gal- 
axies are aligned in thin and kinematically coherent planar structures’”’. 
It is interesting to evaluate the incidence of such planar structures in 
the larger galactic population, because the Local Group may not be 
a representative environment. Here we report measurements of the 
velocities of pairs of diametrically opposed satellite galaxies. In the 
local Universe (redshift z < 0.05), we find that satellite pairs out toa 
distance of 150 kiloparsecs from the galactic centre are preferentially 
anti-correlated in their velocities (99.994 per cent confidence level), 
and that the distribution of galaxies in the larger-scale environment 
(out to distances of about 2 megaparsecs) is strongly clumped along 
the axis joining the inner satellite pair (>7o confidence). This may in- 
dicate that planes of co-rotating satellites, similar to those seen around 
the Andromeda galaxy, are ubiquitous, and their coherent motion sug- 
gests that they represent a substantial repository of angular momen- 
tum on scales of about 100 kiloparsecs. 

The satellite galaxies of the Milky Way have long been known to be 
preferentially located close toa plane’, but this observation could be dis- 
missed as a mere coincidence. However, as faint galaxies were uncovered 
in the Sloan Digital Sky Survey’ (SDSS), it became clear that our Galaxy 
hosts a planar structure of satellites with a close-to-polar orientation’?”’. 
The complications due to spatial incompleteness of satellite samples that 
complicate analyses in the Milky Way are largely alleviated when observ- 
ing the next-nearest giant galaxy, Andromeda (M31). The presence of 
a vast plane of co-rotating dwarf galaxies was recently detected in that 
galaxy as a result of new photometric’? and spectroscopic’*”” surveys 
of its halo. A full 50% of the dwarf galaxies around M31 belong to this 
structure’®. The satellites in the plane have galactocentric distances as 
great as ~300 kpc, yet they display very small scatter (12.6 kpc) in the di- 
rection perpendicular to the plane, and they possess coherent kinematics, 
suggestive of common rotation about their host. Recent analyses have 
also uncovered possible galaxy alignments in the M81 system’® and in 
the NGC 3109 association”’. Such satellite alignments may arise natu- 
rally if dwarf galaxies formed from tidal debris left over from ancient 
galaxy mergers”, but this scenario remains difficult to reconcile with 
the high dark matter content deduced for these objects". Although the 
presence of the planar structures in the Local Group is now firmly estab- 
lished, they may represent a fossil of the particular dynamical forma- 
tion history of the Milky Way and Andromeda systems”. It is therefore 
necessary to investigate more distant systems to ascertain the true sig- 
nificance of these local detections. 

We devised a test (Methods) to quantify the incidence of planar systems 
of satellites. Beyond a few megaparsecs from Earth, reliable and accurate 
relative distance measurements are beyond our present technological 
capabilities; this means that we have to deal with two-dimensional pro- 
jections of galactic systems, possessing only the radial component of 
velocity. We take the M31 system as a template for the search for sat- 
ellite alignments, because its global structure and dynamics are at pre- 
sent the best understood. Half of that system shows coherent rotation, 
which means that for orientations that are not exactly face-on to Earth, 


satellites on either side of the host galaxy as seen on the sky will in general 
have opposite velocities relative to the host (that is, the velocities of the 
satellites will be anti-correlated). This motivates the following simple 
detection method: for each satellite around a given host, we check whether 
it possesses a counterpart that is located on the opposite side of the host 
galaxy to within a certain tolerance angle « (Fig. 1a), and, if it does, we 
determine whether the pair has correlated or anti-correlated velocities. 
With circular orbits, no contamination and perfect data, all pairs will 
be anti-correlated if they all belong to co-rotating planes. 

As a control, we first apply this test to the large Millennium II simu- 
lation (MS2) of structure formation and evolution”**, which reflects 
our best theories of galaxy formation in A cold dark matter (ACDM) 
cosmology”. We find that diametrically opposite pairs of bright satellites 
selected from that simulation (Methods) show no kinematic coherence, 
with roughly equal numbers of correlated and anti-correlated pairs for 
all x (filled circles in Fig. 1b). However, with a very different satellite selec- 
tion strategy, a slight preference for co-rotating satellites can be found”’. 
To analyse the behaviour of our statistic in the presence of a contaminat- 
ing background, we forced different fractions of satellites around M31- 
like hosts in MS2 (Methods) to lie within a randomly chosen rotating 
plane. Figure 1c shows the anti-correlation of the satellite pairs as a func- 
tion of the dominance of planar configurations; evidently, a measure of 
the fraction of anti-correlated pairs in real galaxies has the potential to 
reveal whether planar satellite alignments are common. 

We therefore applied this test to the SDSS, which at present gives the 
most complete view of the nearby universe. Because we wish to investigate 
the environment around galaxies similar to the Milky Way and Andromeda, 
we select isolated host galaxies (no brighter neighbour within a distance 
of 500 kpc or with a velocity that differs by less than 1,500kms_') with 
r-band absolute magnitudes in the range —23 mag = M, = —20 mag 
from the NYU Value-Added Galaxy Catalog’. Cosmological parameters 
from the European Space Agency’s Planck mission are assumed”*. To 
ensure a clean sample of satellites, we select hosts up to a redshift of 
z=0.05 (beyond this, few faint satellites are detected), remove hosts 
closer than z = 0.002 to avoid noisy measurements, and remove all gal- 
axies with velocity uncertainties greater than 25kms_'. The satellites 
themselves are any galaxies one magnitude or more fainter than the host, 
but brighter than M, = — 16 mag, within the radial range 20 kpc < R< 
150 kpc (again to be similar to the M31 analysis*). We further require 
that the satellites at projected distance R lie within a velocity of 
300exp[—(300 kpc/R)°*] kms! (Methods and Extended Data Fig. 1). 

As for the MS2 analysis, we retain only those satellites whose dir- 
ection of motion with respect to their hosts is well resolved; because we 
impose an upper velocity uncertainty of 25kms*' for both hosts and 
satellites, we require a minimum velocity difference of 25/2 kms |. 
There are 380 galaxy systems in the SDSS that pass these requirements. 

Various choices for « are examined in Fig. 2. As our toy model shows 
(Methods and Fig. 1b), the highest contrast between anti-correlated 
and correlated satellite pairs should be found for small «. There is an 
inevitable trade-off between the number of satellite pairs that pass the 
selection criteria and the contamination fraction suffered by the sample. 
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Figure 1 | Satellite correlation test. a, Sketch of the satellite selection process. 
b, The fraction of anti-correlated pairs to correlated satellite pairs in MS2 
(rejecting or including ‘orphan’ galaxies; see Methods) is consistently very close 
to 1, independently of «. However, the simple toy model (Methods) shows a 
decrease in the ratio with increasing «. c, Fraction of anti-correlated galaxy pairs 


Strict selection gives good contrast, but poor statistics; lenient selection 
gives poor contrast, but good statistics. Optimal significance will there- 
fore lie at an intermediate tolerance angle, but, given the unknown den- 
sity and kinematic properties of both the normal and putative disk-like 
satellite populations, we believe that the best strategy is to allow the data 
themselves to guide our choice of «. Figure 2a shows that with « = 8°, 
20 of 22 pairs are anti-correlated, implying a significance of 99.994% 
(>4o); these systems are listed in Table 1. (With a less strict velocity 
cut-off, of 20/2 kms /,21 of 23 pairs are anti-correlated.) High signi- 
ficance is found out to « = 15°, although, as expected in the presence of 
non-planar ‘contaminants’, the significance decreases with increasing 
a. By comparison with the simulations where a disk population was added 
to MS2 (Fig. 1c), the observed ratio of anti-correlated pairs to correlated 
pairs (>2.7 at 99% confidence) at « = 8° suggests that >60% of satellites 
reside in planes, although we stress that this constraint is weak because 
this disk model is simplistic. Thus, we have found that the average giant 
galaxy in the SDSS is consistent with our M31 template. Although the 
SDSS spectroscopic observing strategy produced certain spatial biases”, 


Fraction of satellites in planar structure 


as a function of the fraction of satellites in the co-rotating planar population 
(using « = 8°, the most significant peak in Fig. 2c). In the absence of a planar 
component, equal numbers of correlated and anti-correlated satellites 

should be detected. However, the ratio increases as expected as we increase the 
fraction of satellites in the planar component. 


it seems extremely improbable that such biases could artificially cause 
an overabundance of anti-correlated satellite pairs as found here. 

Figure 3 highlights a possible correlation between the direction defined 
by the satellite alignment and the large-scale structure surrounding the 
hosts. An elongated overdensity of galaxies appears to be aligned along 
the axis of the satellite pair, extending out to ~10 times the radial dis- 
tance of the selected pair (Fig. 3a, b). This is consistent with what we see 
in the Local Group, where the M31 satellite alignment points to within 
1° of the Milky Way. Although these filaments of galaxies are much 
thicker than the planes around host galaxies, it is possible that this reveals 
the influence of large-scale structure on the dynamics of the smaller sat- 
ellite system. Furthermore, in MS2 the larger-scale environment around 
anti-correlated pairs shows no strong preferential direction, and neither 
does the environment around SDSS correlated pairs (Fig. 3c). Although 
it remains possible that the large-scale elongation of the galaxy distri- 
bution along the direction of the galaxy pairs is an artefact of the SDSS 
target selection, this seems unlikely given the random orientation of the 
satellite pairs on the sky. 


a T T T b T T T c T T T 
40 | ce 4 
xe} 
2 P= 
acs Go 
S 
30 7 me oS 
oO 8 10F 1 ee 
a = of 
£ ® oo 
5 8 OE 
2 20 13 £9 2 
S o 2 
fe) ==] 
= 5 | BS 
10 1g 2) 1 
@ Anti-correlated 
0 5 Correlated 6 5 
0 10 20 30 0 10 20 30 0 10 20 30 
a(°) a (°) a (°) 


Figure 2 | Anti-correlated satellites in the SDSS. a, The number of satellite 
pairs that have correlated and anti-correlated velocities is shown as function of 
the tolerance angle. There is a clear surplus of anti-correlated pairs for all 
angles considered. b, The fraction of anti-correlated pairs to correlated satellite 
pairs shows an overall decrease with increasing tolerance angle, reaching 2.4 at 
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15°, which we consider the maximum useful opening angle given the low 
number of satellite pairs in the SDSS. ¢, The significance of the excess of anti- 
correlated satellite pairs. The most significant peak has significance >40 at an 
opening angle of 8°. 
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Table 1 | Host and satellite galaxy parameters 
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> RAP) dec) Me RAS! (*) deo®! MS ys RAS? (°) deo?) MS ys2 ahs 
(mag) (mag) (kms~+) (mag) (kms~?) (1023M., kms~} kpc) 
0.0324 318182771 —0.387524 —20.01 318.172302 -0.395010 -1746 43 318.218964 -0.366205 -1831 -131 1.2 
0.0395 3.140439 —-0.048469 -22.00 3.102046 —0.071687 —18.63 -113 3.166931  -0.036225 -19.74 64 5.9 
0.0395 3.140439 —-0.048469 -22.00 3.102046 —0.071687 —18.63 -113 3.182354  -0.028088 -19.35 66 9.7 
0.0197 181.112776 1.895961 -—22.84 181.153046 1.892648 -17.90 189 181.054535 1.900154 -19.29 -159 12.1 
0.0476 129.700626 4.126124 -22.02 129.688538 4.113726 -20.12 -182 129.715744 4.139352 -1939 161 8.3 
0.0443 230.341251 5.066835 -—22.23 230.337646 5.045036 -18.92 107 230.342621 5.082857 -19.08 -116 3.8 
0.0227 203.116383 7.316453 -2246 203.134415 7.294215 -2049 -52 203.106171 7.331840 -1868 195 7.3 
0.0197 210.702609 9.341378 -—22.08 210.619705 9.315668 -—17.20 -180 210.779785 9.363044 -20.82 86 31.4 
0.0360 244.995300 10.493549 -22.41 244980530 10.486047 -18.54 -57 245.008408 10.499287 -21.12 231 30.8 
0.0266 149.548260 15.946962 -21.78 149.566422 15.919011 -1866 211 149.534317 15.963491 -20.39 -96 98 
0.0390 241.663921 17.761186 —22.50 241.681305 17.727509 —-18.62 -180 241.651321 17.796453 -20.56 40 14.7 
0.0208 176.008968 19.949823 -22.68 176.050766 19.942766 -17.97 -—41 175.986832 19.955805 -18.79 62 14 
0.0371 156.234227 20.402742 —22.05 156.282028  20.396730 -18.59 53 156.221756 20.403419 -19.77 -125 5.8 
0.0170 157.320107 26.099230 —-21.48 157.346039 26.070440 —18.94 -107 157.256622 26.153496 -18.20 87 28 
0.0272 168.791680 31.033707 —22.27 168.785461 31.002020 —20.08 212 168.804108 31.079819 -17.95 -166 23.8 
0.0428 170.378180 33.957669 —22.57 170.338287 33.955376 -19.55 -111 170.402756 33.958282 -18.79 -98 -6.6 
0.0277 167.171535 36.161068 —21.50 167.216934 36.116596 -18.07 -90 167.115128 36.206646 -19.52 94 78 
0.0316 247.216940 42.812009 —21.74 247.204544 42.805214 -19.21 -116 247.243317 42.828125 -20.59 45 7.0 
0.0369 164.010031 44.395565 —21.29 163.993439 44380817 -18.96 111 164.039291 44428234 -19.13 159 -8.3 
0.0375 128.717100 44.637116 —21.91 128.674500 44.593529 -19.51 134 128.751175 44.671597 -19.02 -156 9.9 
0.0245 195.793476 47.393865 -21.55 195.792770 47.381126 -19.08 -97 195.788147 47447929 -18.75 51 2.1 
0.0233 143.529090 67.431139 —21.35 143.660706 67.374802 -18.20 -90 143.446182 67.477814 -1846 52 2:1 


Redshifts (z), positions (right ascension (RA) and declination (dec.)), absolute magnitudes (M,) and radial velocities (v) of the hosts (superscript ‘h’) and the satellites (superscripts ‘S1’ and ‘S2’), for the sample 
selected with a tolerance angle of « = 8°. The final column lists the sums of the angular momenta of the stellar components of the two satellites: | L. | = | LS1* + LS?* | The constant A takes the value 1 if the pair 


have anti-correlated velocities and takes the value —1 if the velocities are correlated. 


Table 1 also lists the angular momenta of the satellite pairs, calculated 
using the projected distances, line-of-sight velocities and estimated stel- 
lar masses” of these galaxies. For comparison, the total angular momen- 
tum of our Galaxy in stars is |L+| ~ 9 X 10'°Mo kms _' kpc (whereMo 
is the solar mass and we approximate the Milky Way as an exponential 
disk*’). Thus, the angular momentum contained in the stellar compo- 
nent of the aligned satellites we have identified (mean of the « = 8° 
sample: (A|L»|) = 8.3 X 10'°Mo kms ‘kpc, where A indicates sign as 
explained in Table 1) is comparable to the angular momentum ina giant 
galaxy’s stellar disk. This suggests that these coherent structures make a 
substantial contribution to the angular momentum budget on galaxy 
halo scales (~100 kpc), although a better understanding of their incid- 
ence and physical properties is required to quantify their importance. 

Our tests were constructed using MS2 as a control sample to predict 
what should have been a priori expected in ACDM cosmology. Just as 
this paradigm did not predict the planes observed in the Local Group’, 
it did nota priori predict the velocity correlations presented here. It should 


be noted, however, that MS2 contains only dark matter, and future large 
cosmological simulations that include detailed baryonic physics should 
be performed to see ifthe discrepancies can be reduced. Although we are 
still uncertain whether the pairs of satellites detected here actually form 
part of kinematically coherent planes, their velocity anti-correlation, align- 
ment with larger-scale structures and high angular momentum are all 
unexpected properties of the Universe that will require explanation. 


METHODS SUMMARY 


Our test uses satellites that are diametrically opposite each other around their host 
to quantify the incidence of rotating planar alignments. The signature of coherent 
rotation is an enhancement in the number of anti-correlated satellites. Using a small 
tolerance angle (Fig. 1a) and a minimum velocity difference, samples can be gen- 
erated with a higher probability of containing edge-on planar structures, if they are 
present. We first test this method on a simple toy model, to show that it behaves as 
expected for particular choices of the tolerance angle parameter « (Fig. 1b): the 
contrast of the planar component is seen to decrease with increasing ~, suggesting 
that small values of « should preferably be used for the tests. To construct a more 
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Figure 3 | Correlation with environment. a, Superposition of SDSS galaxies 
(within 500 kms!) that surround the hosts of the satellite pairs with anti- 
correlated velocities (using « = 15°). Each field is rotated so that the receding 
satellite lies on the positive abscissa. A clear horizontal feature is found out to 
~2 Mpc; this result remains robust for various subsamples and parameter 
choices. (The black disk shows a radius of 150 kpc.) b, The angular distribution 
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of the galaxies in a, rejecting galaxies within R < 150 kpc. The significances of 
the peaks for the R< 1.0, 1.5 and 2 Mpc samples are 3.7a, 4.80 and 7.1o, 
respectively. c, Applying the same procedure to the region around SDSS 
correlated pairs (red lines, using « = 20° to build up better statistics) shows 
minimal correlation, as does the environment around anti-correlated pairs in 
M&2 (purple lines). 
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realistic model, we select galaxies and their satellites from the MS2 cosmological 
simulation, and reassign some of the satellites to planar structures. The selection 
process for hosts and satellites is kept as close as possible to the process applied to 
the observed SDSS sample. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Simple kinematic test on diametrically opposite satellites. The simple statistical 
test we have developed is devised to allow us to quantify the frequency of satellites 
belonging to disk-like structures. We use primarily the distinctive property of a 
rotating disk-like structure that objects on opposing sides have anti-correlated 
velocities. The expectation from observations of M31 is that any such structures 
are superposed on a ‘contaminating’ population of ‘normal’ satellites that appear, 
to first approximation, to have a spherically symmetric distribution around the 
host. The presence of such a contaminating population, together with the fact that 
most galaxies beyond the Local Group have only a small number of satellites with 
well-measured velocities, means that at present we can test only for the alignments 
in a statistical manner on a sample of hosts. 

Thus, the challenge is to devise a means to enhance the contrast of the putative 
disk over a potentially dominant spherical population. Because our viewing dir- 
ection on these distant systems cannot be special, on average, the median inclina- 
tion of any disk-like structures will be 60° (if we define an edge-on configuration to 
have inclination 90°). This naturally suggests using a test that makes use of the 
resulting elongation. However, we can bias a sample towards being more edge-on 
by selecting those systems with satellites that have radial velocities significantly 
different from their host galaxy. (Face-on disk-like alignments will have zero 
velocity difference, as viewed along our line of sight.) 

As sketched in Fig. 1a, we consider systems consisting of a massive host galaxy 
harbouring at least one pair of satellites. Picking each satellite in turn, we determine 
whether another satellite lies on the opposite side of the host within a tolerance 
angle «. If both satellites possess a velocity that is significantly different from that of 
their host, the pair is retained for study. Motivated by the galaxy velocity uncer- 
tainties in the SDSS, we selected this minimum velocity difference parameter to be 
AVmin = 25/2 kms? in all calculations presented here (the results are qualita- 
tively very similar for 30 km s |} <AVmin<40kms '). 

To explore how the method works, we first constructed a very simple test con- 
figuration, containing 50% of satellites in a spherical population and 50% in a disk- 
like structure. Both structures were populated with satellites with uniform probability 
at radii between 20 and 150 kpc. The satellites in the disk rotate at 40 km s~ 1 inde- 
pendent of radius, while the spherical population has an isotropic velocity dispersion 
of 70kms_!. This toy model is then viewed from a random direction, and two 
satellites are selected at random beyond a projected radius of 20 kpc. If the two 
satellites lie on opposite sides of the host to within the chosen tolerance angle, and 
if both satellites have a velocity difference with respect to their host of more than 
AV ymin» the pair is retained, and we determine whether the velocities are correlated or 
anti-correlated. The procedure is repeated 2,000 times for each tolerance angle value. 

The open triangles in Fig. 1b show the ratio of the number of anti-correlated to 
correlated pairs as a function of the tolerance angle «. As this simple model shows, 
we expect the highest contrast to be found at small tolerance angles. The selection 
of diametrically opposite galaxies, together with the minimum velocity criterion, 
ensures that configurations close to edge-on are preferentially selected; for this toy 
model, the average inclination angle for the « = 10° sample is 80°. 
Construction of artificial satellite systems from the Millennium II simulation. 
To explore further the reliability of our method to uncover genuine planar satellite 
alignments, we decided to construct artificial galaxy systems that we could run and 
test our algorithm on. For this purpose, the Millennium II simulation (in particular 
with the semi-analytic modelling in ref. 24) provides an ideal view of the expected 
distribution of galaxies and their satellites in a very large (10° * Mpc’) volume in 
a ACDM universe (h is the dimensionless Hubble parameter). The catalogue lists 
the absolute r-band magnitudes, total galaxy masses, positions and velocities that 
are necessary for our comparison with observations. 
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To create the random views of galaxies derived from the simulation, we proceed 
as follows. We first choose a random direction from which we will view the galaxies 
in the Millennium II volume. A list of candidate host galaxies is generated by 
selecting those objects with absolute magnitudes in the range —23 = M, = —20 
(identical to the selection from the real SDSS data presented in the main text). We 
examine each of the candidate hosts in turn, placing the host to be studied at 
10,000 kms"! (the mean velocity of the SDSS sample), and then making sure that 
it appears isolated in projection with no brighter neighbour within 0.5 Mpc, with 
velocity differences less than 1,500 km sl, 

We make a list of all the neighbouring galaxies within a projected distance of 
500 kpc and a velocity of 1,500 kms’ that are at least one magnitude fainter than 
the host, but that are brighter than M, = —16 mag; we will refer to these objects as 
‘satellites’. (We reject ‘orphan’ galaxies, which are systems whose parent subhaloes 
are no longer resolved in the Millennium II simulation, but, as Fig. 1b, c shows, our 
results remain qualitatively identical if these objects are included.) For each host, 
we then randomly draw a vector to define the normal to the planar population, and 
we go through the list of satellites, randomly assigning them to the planar popu- 
lation, according to the desired planar component fraction that we wish to test for. 
Clearly, when testing for a planar fraction of zero, the Millennium II positions and 
velocities remain unaltered. For those satellites that are assigned thus to the planar 
component, we keep the galactocentric distance that they had in the Millennium II 
simulation, but place them onto the plane with a random azimuthal angle. The 
space velocities of the planar satellites are devised to give circular motions in the 
plane of the alignment, with the total velocity chosen from the circular velocity of a 
universal halo model"! of total mass given by the virial mass of the host. 

A Gaussian random velocity of 15 kms! (a representative value for the SDSS 
velocity errors) is added to the radial velocity of the host and all its satellites. Having 
thus reordered the three-dimensional positions of some of the satellites, we filter 
the sample to keep those objects that lie at projected distances in the range 
20 kpc < R< 150 kpc and that have a velocity difference of less than 300exp[—(300 
kpc/R)°*] kms"! with respect to the host (Extended Data Fig. 1). The brightest two 
satellites within the 20 kpc < R < 150 kpc annulus are selected for study. If the two 
satellites lie on opposite sides of the host to within the chosen tolerance angle, and if 
both satellites have a velocity difference with respect to their host of more than 
Av mir» then the pair is retained and we determine whether the velocities are corre- 
lated or anti-correlated. 

The entire process is then repeated for all other candidate hosts. We rerun the 

entire procedure, selecting new initial viewing angles, as many times as necessary 
until a total of 2,000 satellite pairs have been generated. 
An alternative estimate of the fraction of satellites in planes. With the para- 
meter selections detailed in the text, and setting AVnin = 25\/2kms_ |, there are 
380 galaxy systems in the SDSS. Using « = 8°, we find 20 of 380 pairs to have anti- 
correlated velocities, and 2 of 380 to have correlated velocities; that is, 22 of 380, or 
5.8%, of all pairs are found with this tolerance angle. With the unaltered Millen- 
nium II simulation (0% in a disk), we find 4.7% of pairs with « = 8°; this fraction 
rises to 4.9% with a 50% disk component, and to 6.7% with a 100% disk compon- 
ent. This suggests that the fraction of satellites in a planar component within the 
SDSS is greater than 50%, consistent with the estimate given in the text, but the 
simplicity of the disk model for the planar component prevents us from drawing 
strong conclusions from this comparison. 
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Extended Data Figure 1 | Adopted velocity envelope relation. Dots markthe | Way*. The empirical envelope relation shown in red (300exp[—(300 kpe/ 
distance—velocity distribution of satellites in the MS2 simulation that R)°*] km s_) is used in our analysis as a means to reduce contamination from 
surround isolated host galaxies of similar luminosity and mass to the Milky velocity outliers. 
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Misaligned protoplanetary disks in a young binary 


star system 


Eric L. N. Jensen! & Rachel Akeson? 


Many extrasolar planets follow orbits that differ from the nearly 
coplanar and circular orbits found in our Solar System; their orbits 
may be eccentric’ or inclined with respect to the host star’s equator’, 
and the population of giant planets orbiting close to their host stars 
suggests appreciable orbital migration*. There is at present no con- 
sensus on what produces such orbits. Theoretical explanations often 
invoke interactions with a binary companion star in an orbit that 
is inclined relative to the planet’s orbital plane**. Such mechanisms 
require significant mutual inclinations between the planetary and 
binary star orbital planes. The protoplanetary disks in a few young 
binaries are misaligned* '”, but often the measurements of these mis- 
alignments are sensitive only to a small portion of the inner disk, and 
the three-dimensional misalignment of the bulk of the planet-forming 
disk mass has hitherto not been determined. Here we report that the 
protoplanetary disks in the young binary system HK Tauri are mis- 
aligned by 60 to 68 degrees, such that one or both of the disks are 
significantly inclined to the binary orbital plane. Our results demon- 
strate that the necessary conditions exist for misalignment-driven 
mechanisms to modify planetary orbits, and that these conditions 
are present at the time of planet formation, apparently because of 
the binary formation process. 

Although the three-dimensional orbital orientation is not yet mea- 
surable for any of the known extrasolar planets, measuring the orienta- 
tion of protoplanetary disks has the potential to provide information 
about planetary orbits during the planet formation process. Because these 
disks are hundreds of astronomical units (1 AU is the average Sun—Earth 
distance) in diameter, they can be spatially resolved at the 120-160 pc 
distances of the nearest star-forming regions. If the disks around both 
stars in a binary system can be shown to be misaligned, then it is clear 
that both cannot be aligned with the (usually undetermined) binary 
orbital plane. Indirect evidence of disk misalignment is provided by 
misaligned jets’ and by polarimetry'*"*. More directly, images of sev- 
eral young binary systems show that the disk around one star is nearly 
edge-on to Earth® *’”. In some of these systems, infrared interferometry 
or imaging constrains the inclination of the disk around the other star, 
giving a lower limit on the degree of misalignment of the disks*”, although 
the position angle of the disks is uncertain and the direction of rotation 
is unknown. For systems with detectable millimetre-wavelength emission, 
measurement of Keplerian rotation in both disks in a binary system pro- 
vides the opportunity to measure the full three-dimensional orientation of 
the disks’ angular momenta. 

One such system is HK Tauri, a young binary system with a projected 
separation of 2.4 arcsec (ref. 15), which is 386 au at the distance (161 pc) 
of this part of the Taurus clouds'®. Age estimates for this system range 
from 1 to 4 Myr (ref. 17), placing it in the age range at which planet for- 
mation is thought to occur. The southern, fainter star, HK Tau B, is sur- 
rounded by a disk that blocks the starlight; the disk can thus be clearly 
seen in scattered-light images at near-infrared and visible wavelengths 
to be nearly edge-on®””; statistical arguments suggest that the diskis un- 
likely to be completely aligned with the binary orbit®”. The northern star, 


HK Tau A, has strong millimetre-wavelength continuum emission’*”, 


showing that it too is surrounded by disk material; however, because 
the disk does not block the starlight, the disk cannot be seen in scattered 
light owing to the brightness of the star. The striking difference in their 
visible-light appearance shows that these two disks are not perfectly 
aligned, but the degree of misalignment has not previously been known 
because the molecular gas in the northern disk has not been resolved, 
and a modest inclination difference would be sufficient to explain the 
different scattered-light morphologies. 

We observed HK Tau with the Atacama Large Millimeter Array (ALMA) 
at frequencies of 230.5 and 345.8 GHz, covering continuum emission 
from dust and line emission from the carbon monoxide (CO) 2-1 and 
3-2 rotational transitions, respectively (Methods). Both the northern 
and the southern components of the binary are clearly detected in the 
continuum and the CO line emission. The CO maps (Fig. 1) show the clear 
signature of rotating disks around each star, with one side of the disk 
redshifted and the other side blueshifted. The orientations of the two 
disks are significantly different, with the northern disk axis elongated 
nearly north-south, roughly 45° from the elongation axis of the south- 
ern disk. 

We used a Markov chain Monte Carlo analysis to fit disk models to 
our data to determine the three-dimensional spatial orientations of the 
disks (Methods). For HK Tau B, the disk orientation is well known from 
previous scattered-light imaging, and so we adopt from that work’* an 
inclination i = 85° + 1° and position angle PA = 42°. Although the disk 
inclination and position angle were previously known, our imaging of 
HK TauB provides new spatial information because the direction of 
disk rotation, apparent in Fig. 1b, removes a 180° ambiguity in the disk’s 
orientation. In what follows, we adopt the convention that the position 
angle is measured east of north and that the quoted position angle is that 
of the redshifted edge of the disk. Our model fitting reproduces the indi- 
vidual velocity channel images well for both sources (Fig. 2), allowing us 
to determine the position angle, inclination and direction of rotation of 
the molecular gas disk in the northern source, HK Tau A. The Markov 
chain Monte Carlo analysis gives PA = 352° + 3° andi = 43° + 5° (Ex- 
tended Data Fig. 1); all uncertainties are given as 68.3% credible intervals. 

Measurement of the PA and the inclination of both disks lets us de- 
termine the angle between the two disks’ angular momentum vectors, 
with one ambiguity. Equal inclinations on either side of edge-on (i = 90°) 
will appear identical unless it can be determined which edge of the disk 
is nearer to the observer, for example if high-resolution imaging can 
determine that one edge of the disk is shadowed by a flared disk edge 
and the other is not. In the case of HK Tau B, this orientation is known 
from scattered-light imaging, but it is still unknown for HK Tau A. 
Combining the observational constraints, we find that the angle between 
the two disks’ angular momentum vectors is 60° + 3° if both vectors 
point to the same side of the sky plane, or 68° + 3° if they do not (Fig. 3). 

The clear misalignment between the two disks has important impli- 
cations for planet migration and orbital evolution, as well as for theories 
of binary formation. Although nothing in our observations constrains 
the orientation of the binary orbital plane, the fact that the two disks are 
misaligned with each other means that they cannot both be aligned 
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Figure 1 | Observations of the CO(3-2) line in the HK Tau binary system. 
a, Integrated gas emission from each disk, with contours at steps of 
0.3 Jy beam 'kms_?, three times the root mean squared noise in the maps; the 


with the binary orbital plane. At least one of the disks must be misaligned 
with the binary orbit by 30° (half the total misalignment) or more. The 
misalignment for one or both disks is probably greater than this, because 
this minimum misalignment occurs only for one specific orientation of 
the binary orbit. This misalignment means that planets formed from 
these disks will be subject to Kozai-Lidov oscillations**” that may drive 
changes in their eccentricities and orbital inclinations, or that the disks 
themselves may be driven into misalignment with the stars’ rotation axes”. 
It is sometimes stated that only misalignments greater than the critical 
angle of 39.2° can cause Kozai-Lidov oscillations”, but it has recently 
been shown that this is not strictly true if the body in the inner orbit is 
relatively massive or has an eccentric orbit, or both™*. In any case, it is 
quite likely that the inclination relative to the binary orbit exceeds this 
critical angle for one or both of the disks; only 1.6% of all possible 
binary orbits are inclined to both disks by less than 39.2° ifthe disks are 
misaligned by 60°. 
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angular resolution of the observations is shown by the beam size in grey at lower 
left. RA, right ascension; dec., declination. b, Velocity-weighted emission, 
illustrating the rotation of both disks, and their misaligned orientations. 

This result is consistent with recent simulations of binary formation®”, 
which predict that disks will be misaligned with the binary orbit, espe- 
cially in systems with orbital semimajor axis lengths greater than 100 au, 
where dissipation mechanisms do not act quickly to align the disks with 
the orbit”*”*. In earlier simulations of the formation of individual binary 
systems from isolated cloud cores, the level of misalignment depended 
on the choice of initial conditions”. However, more recent simulations*”” 
focus on the formation of entire clusters and thus do not presuppose 
specific initial conditions (or even a particular formation mechanism) 
for an individual binary”. In the cluster simulations of ref. 26, all binary 
systems with orbital semimajor axes greater than 30 AU have disks that 
are misaligned with each other, with a mean angle of 70° + 8°. The mis- 
alignment we observe here is thus consistent with formation by means 
of turbulent fragmentation rather than disk instability”’. 

Although it remains to be seen how the protoplanetary disks in a 
statistical sample of young binary systems are oriented, it is suggestive 
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Figure 2 | Data, best-fit model and data—model difference for the disks 
around HK Tau A and B. Contours are in steps of 28 mJy, the root mean 
squared noise in the map, starting at three times that value. Negative contours 
are dashed. North is up and east is to the left, with tickmarks at 1 arcsec 
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intervals. Three channels near the line centre, which has a velocity of 
6.1kms_', are omitted from the figure and from calculating the chi-squared 
statistic in the modelling, owing to absorption from the surrounding cloud. 
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Figure 3 | Posterior probability distribution for the angle 7 between the two 
disks’ angular momentum vectors. The purple histogram is for the case 
where both disks’ vectors are on the same side of the sky plane; the green 
histogram is for the case where they are on opposite sides of the sky plane. The 
68.3% and 95.4% credible intervals are shown by dashed and dotted lines, 
respectively. See Methods for the definition of 4. 


that in the handful of systems where this measurement has been made, 
the misalignments are large. If this is a common outcome of the binary 
formation process, and especially if it extends to lower-mass binary 
companions (which may easily go undetected), then perturbations by 
distant companions may account for many of the orbital properties 
that make the present sample of extrasolar planets so unlike the planets 
of our own Solar System. 


METHODS SUMMARY 


The CO(2-1) and CO(3-2) ALMA observations of HK Tau were calibrated using 
standard techniques. The antenna configuration yielded respective spatial resolu- 
tions (here defined by beam sizes from the CLEAN algorithm for image recon- 
struction) of 1.06 arcsec X 0.73 arcsec and 0.69 arcsec X 0.51 arcsec and spectral 
resolutions of 1.3kms ‘and 0.85kms | in the two bands. To determine the disk 
orientations, we calculated azimuthally symmetric, vertically isothermal parame- 
terized disk models using a Monte Carlo radiation transfer code, and then sampled 
the model images at the same spatial frequencies and velocities as the observations, 
allowing us to compare models directly with the calibrated complex visibilities (that 
is, the Fourier transform of the sky brightness distribution) recorded by the inter- 
ferometer. A Bayesian Markov chain Monte Carlo analysis yielded posterior prob- 
ability distributions for the disk parameters. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


We observed HK Tau with the Atacama Large Millimeter Array (ALMA) as part of 
a survey of pre-main-sequence binaries in the Taurus- Auriga star-forming region*’. 
Band-6 observations were taken on 17 November 2012 with 27 antennas and band- 
7 observations on 16 November 2012 with 28 antennas. The correlator was con- 
figured with each of the four basebands covering a total bandwidth of 1.875 GHz 
with a channel spacing of 488 kHz. In band 6, one of the correlator basebands was 
set to cover the CO(2-1) transition at 230.5 GHz, whereas in band 7, one baseband 
covered CO(3-2) at 345.8 GHz. We took one observation of HK Tau in each band, 
bracketed by observations of the gain calibrator J051002+ 180041, which measures 
the phase and amplitude response as a function of time. We calibrated the data for 
each band separately using the CASA software and scripts provided by the NRAO 
ALMA centre. The system temperature, water vapour phase corrections and flag- 
ging were applied using the standard scripts. The amplitude and phase as functions 
of frequency were calibrated against J0423-013. The absolute flux calibration used 
Callisto and the 2012 flux models, which resulted in a zero-spacing flux of 8.54 Jy at 
230 GHz and 19.45 Jy at 345 GHz. 

We generated continuum and CO images using the CLEAN task within CASA, 
with a robust beam weighting of -1.0. These settings resulted in a clean beam size 
of 1.06 arcsec X 0.73 arcsec in band 6 and 0.69 arcsec X 0.51 arcsec in band 7. The 
continuum flux of HK Tau is sufficient to provide a self-calibration reference, and 
we applied a phase-only self-calibration using HK Tau as the reference. Given the 
short time on source, we averaged the continuum data to a single point in cal- 
culating the self-calibration corrections. The channel spacing, combined with 
Hanning smoothing in the correlator, provides a spectral resolution of 0.85 kms"! 
for the CO(3-2) line and 1.3kms ? for the CO(2-1) line. The continuum emis- 
sion is not strong enough to substantially affect the individual channels in the CO 
data and, thus, we did not subtract it. 

The maps show clearly detected CO emission centred at an LSR velocity of 
roughly 6.1 kms *. Examination of the individual channels of the CO data shows the 
presence of foreground absorption in the LSR velocity range of roughly 5-8kms_', 
consistent with the absorption seen in the single-dish '*CO spectrum®. 

To quantify the disk properties, in particular the spatial orientation of each disk, 
we fitted a series of models to the 345 GHz CO(3-2) data. Following many recent 
authors, we adopt a form for our disk model that is given by a self-similarity solu- 
tion for circumstellar disks*, and use the specific parameterization of ref. 34. 

Although circumstellar disks in binary systems may be warped owing to inter- 
actions with their stellar companions****~*’, the amount of warping is predicted to 
be largest for disks with aspect ratios less than 0.05. In contrast, the HK Tau B disk 
is relatively thick; with its measured scale height of 3.8 AU at a radius of 50 AU (ref. 18), 
the HK TauB disk is predicted to have little or no warping. Assuming that the 
thickness of the HK Tau A disk is similar, warping should be of minimal import- 
ance for these disks, and we thus adopt an azimuthally symmetric disk model. 

The gas density distribution in the model is azimuthally symmetric, and given by 


ings 2 (He) | 


where z is the vertical height above the disk midplane, and & is the surface density 
distribution, given by 


x)=2.(2) | (Z yo 


where 2’. is a constant such that the surface density at the characteristic radius r, is 
&'/e. Hp is the pressure scale height, assumed to be in hydrostatic equilibrium and 
thus given by 


p(r.2) 


Hy(r)= ane 
PY \ ume GM, 


where T is the temperature, kg is Boltzmann’s constant, yu is the mean molecular 
weight of the gas, mm, is the mass of a hydrogen atom and M, is the mass of the star. 
The disk is assumed to be vertically isothermal, and the radial temperature profile 
is assumed to satisfy a power law and is normalized at 10 au: 


T(r) = To(—_) a 


10 Au. 


Because the ambient radiation in the molecular cloud heats material even far from 
any star, we adopt a minimum temperature of 10 K; that is, the power law above 
applies only out to the radius where T(r) = 10 K, beyond which the temperature is 
constant at 10K. 

We assume that the dust and gas have the same temperature at a given radius, 
that the gas is in local thermodynamic equilibrium, that the gas-to-dust ratio by 


mass is 100 and that the number fraction of CO in the gas is 10 *. With these 
assumptions, there are six free parameters that characterize the disk emission and 
kinematics in the model: Mgisks To. To, Mx, y and q. In addition, there are the two 
orientation parameters for the disk: its position angle PA and its inclination i to the 
line of sight. It is these latter two properties that are of primary interest to us in 
determining the disks’ misalignment; the other six are varied to reproduce the 
observed emission adequately, but we make no claim that they represent the true 
disk properties in detail, given the simplicity of the model and degeneracies 
between the parameters. We fix the position of each component at the coordinates 
determined from fits to the velocity-integrated (first-moment) maps of the CO 
emission, and we fix the line centres for both components at 6.1 kms~ t, 

To find the distributions of parameter values that fit the data, we calculate a set 
of model disks using the Monte Carlo radiation transfer code RADMC-3D version 
0.35 (ref. 38). The standard approach to comparing models to interferometric data 
is to transform the model images into complex visibilities in the u-v plane (where 
the visibility is the Fourier transform of the sky brightness distribution and u and v 
are the coordinates in that plane) so that they can be compared directly with the 
calibrated data recorded by the interferometer, without the intervening, nonlinear 
step of creating an image from the interferometric data. In the case of a binary system 
where both disks have strong emission, this presents an additional complication; 
although the two disks are cleanly separated in the image plane, their emission 
overlaps in the u-v plane. Thus, it is necessary to compute models for both disks to 
compare models to data in the u-v plane. This increases the number of free para- 
meters for each step in the model-data comparison from 8 to 16, complicating the 
exploration of the parameter space. 

To make this problem more tractable, we pursue a modelling strategy that rests 
on the assumption that the best-fit disk parameters for one star are uncorrelated 
with those of the other star, allowing us to fit for only 8 parameters at a time. As a 
preliminary step, we model the two disks in the HK Tau system individually. For 
each component of the binary, we use RADMC-3D with the model described 
above to create a single model disk, with images at different velocities across the 
CO(3-2) line that are separated by the velocity resolution of our observations. We 
then use the NRAO software CASA to sample the model image with the same u-v 
coverage as our ALMA observations, and we create a CLEAN image in exactly the 
same way as we imaged our observations of HK Tau. The resultant model image is 
compared with a sub-image of our data with the same field of view, velocity 
channel spacing and pixel scale, and we calculate y” between model and data. 
Using this image-plane modelling and the MCMC analysis described in more 
detail below, we find the model parameters that provide the best fits for the A 
and B disks in the image plane. 

With these disk parameter estimates, we then proceed with the more robust u-v 
plane modelling. To make the exploration of parameter space tractable, we vary 
parameters for only one disk at a time. In each model run, we hold constant the 8 
parameters for one disk at values previously found to give a good fit, and vary only 
the 8 parameters for the other disk. We combine the two disk model images (one of 
which is always the same for a given run) into a single image with the disks centred 
at the known positions of HK Tau A and B. We then sample this model image with 
the same projected baselines used in the ALMA observations to generate model 
visibilities that can be compared directly with the data. We bin the data and models 
into 0.85 kms! channels, the spectral resolution of the observations, and exclude 
the three channels near the line centre (LSR velocity range, 5.4-7.9 km s'), where 
there is significant absorption from the cloud. We then calculate 7” between the 
model and data visibilities, with separate terms in the ¢ sum for the real and 
imaginary parts of each visibility point. The 10 channels shown in Fig. 2 (spanning 
LSR velocities 0.3-5.4 and 7.9-11.3 kms‘) are used in calculating 7’. 

Because multiple combinations of the model parameters can provide almost 
equally good fits to the data, and because the parameter space is large, we use 
MCMC to determine the posterior probability distribution of each parameter. As 
noted above, in each chain we vary only the 8 parameters for one of the disks. We 
use the Python code emcee*’, which implements an affine-invariant ensemble 
sampler*®. For most parameters we use a flat prior probability, with the exception 
of the inclination, where we use a sin(i) prior probability to account for the fact that 
randomly distributed inclinations do not have equal probabilities of a given i. We 
evaluate the posterior probability of each model as exp(-7’/2) times the prior 
probability. We ran several separate chains to explore a variety of starting positions 
for the disk’s free parameters, and different fixed parameters for the other disk. In 
each chain, the ensemble had 30 ‘walkers’ and ran for at least 500 steps. For each 
chain, we discarded the first 150 steps (4,500 model evaluations) as ‘burn-in’ so 
that the results would be independent of the starting positions chosen. Because the 
results from different chains were consistent with each other, we combined them 
to produce our final parameter estimates. Not including the burn-in steps, our 
final results for HK Tau A and HK TauB are based on 66,000 and 30,000 model 
evaluations, respectively. As noted above, in the case of HK TauB, the position 
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angle and inclination are well known from scattered-light imaging, and so for 
HK Tau B we adopt the PA and i values found from previous work in the analysis 
that follows, combined with our new measurements for HK Tau A. 

The key quantity we are interested in determining is the angle 4 between the 
two disks’ angular momentum vectors. It is related to the measured position angles 
and inclinations through spherical trigonometry by 


cos (A) = cos (ij) cos (i2) + sin (i:) sin (i)cos(PA, — PA2) 


With both inclinations specified in the usual range of 0° to 90°, the above equation 
effectively assumes that both disks have their angular momentum vectors oriented 
on the same side of the plane of the sky. For the case where the two vectors are on 
opposite sides of the sky plane, one i above should be replaced with 180° -i if i is 
defined always to be less than 90°. Here we adopt the convention used in specifying 
the inclination of visual binary orbits“’, where i ranges from 0° to 180°. In this con- 
vention, i < 90° corresponds to the case where the disk’s orbital motion is in the 
direction of increasing position angle, or, equivalently, where the disk’s angular 
momentum vector is inclined by an angle 90° — i towards the observer relative to 
the sky plane. Thus, although our adopted convention for position angle (that of 
the redshifted edge of the disk) is the same as that typically adopted in previous 
work”, our inclination convention differs. 

By this convention, the inclination of the HK TauB disk is 95° (because it is 
known from scattered-light images that the northern face of the disk is tilted 
towards Earth), and the best-fit inclination of the HK Tau A disk could be either 
43° + 5° or 137° +5°. In practice, the two cases do not yield greatly differing 
values of 4 because HK Tau B is so close to edge-on. 

In the near future, it may be possible to distinguish between these two inclina- 
tions for HK TauA. A recently discovered Herbig-Haro object, HH 678, lies 
10 arcmin west of HK Tau’. Its position angle of 267° with respect to HK Tau 
places it on a line that is nearly perpendicular to the HK Tau A disk, suggesting that 
it may be associated. If so, the sign of the radial velocity of the Herbig-Haro object 
would break the inclination degeneracy of the HK Tau A disk. 

We used fixed values of the orientation of the HK Tau B disk, and the values of 
PA andi for HK Tau A from our MCMC chains, to find the posterior distribution 
of A for the two disks (Fig. 3). We take the median of the posterior distribution as 
the most probable value, and we find the values above and below the median that 
encompass 34.15% of the total probability in each direction to define the 68.3% 
credible interval (dashed lines); we similarly calculate the 95.4% credible interval 
(dotted lines). A plot of the posterior distributions of PA and i for HK TauA 
(Extended Data Fig. 1) shows that they are uncorrelated, as expected. 

Although our primary focus is the relative orientations of the disks, the mod- 
elling used here has the potential to determine other parameters of interest, in 
particular the stellar mass. Pre-main-sequence stellar mass measurements are of 
particular interest because they place valuable constraints on pre-main-sequence 
evolutionary models****. Unfortunately, owing to our modest spatial resolution, 
coupled with the compact size of the HK Tau disks and the cloud absorption over a 
range of several kilometres per second near the line centre, we are unable to place tight 
constraints on the stellar masses. Our MCMC analysis yields M,. = 0.6 + 0.1Mgun for 
HK Tau A and M,, =1.0+0.1Mgun for HK Tau B, where the quoted credible inter- 
vals do not take into account the uncertainty contribution from the distance to the 
HK Tau system. The HK Tau A mass is consistent with previous mass estimates 
from pre-main-sequence evolutionary tracks'’. However, the HK TauB mass is 
quite surprising. The published spectral types of HK Tau A and B are M1 and M2, 
respectively'’, and near-infrared, high-resolution spectra similarly yield spectral 
types of M0.5 and M1 for HK Tau A and B (L. Prato, manuscript in preparation). 
Given its cooler spectral type, and assumed coeval formation, HK Tau B should be 
less massive than HK Tau A. A possible explanation of the mass discrepancy would 
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be if HK Tau B were itself a close binary. However, the near-infrared spectra show 
that the radial velocities of HK Tau A and B are the same to within 1 kms !, with 
no evidence of double lines in the spectra of either star (L. Prato, manuscript in 
preparation). 

Thus, we suspect that our stellar mass estimate for HK Tau B may be inaccurate. 
It may be that our simple models do not adequately reproduce the vertical struc- 
ture of the disk, which is likely to be much more important in modelling a nearly 
edge-on disk, like HK TauB, than in modelling one that is more face-on, like 
HKTauA. For example, ALMA science verification data of the disk around 
HD 163296 show that a vertical temperature gradient is necessary to reproduce 
the CO emission**””. It is also possible that the uncertainty in the exact systemic 
velocity of the system (due to contamination from the molecular cloud) is a factor. 
Using a fixed systemic velocity parameter may introduce a small bias in the fit 
parameters, particularly the stellar mass. However, we see no structure in the 
residuals that would arise from using a systemic velocity far from the correct value. 

We emphasize that the position angle and inclination for HK Tau B used in the 
analysis of disk misalignment were taken from previous scattered-light imaging, 
and that modelling uncertainties for HK Tau B thus do not affect our main result 
here. Future ALMA data with better spatial resolution and using an isotopomer 
that is less sensitive to cloud absorption may help resolve the puzzle of HK Tau B’s 
stellar mass. 
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Extended Data Figure 1 | Posterior probability distributions for the position angle and inclination of the disk around HK Tau A. 
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Mapping the optimal route between two 


quantum states 


S. J. Weber’, A. Chantasri’, J. Dressel*, A. N. Jordan?*, K. W. Murch? & I. Siddiqi 


A central feature of quantum mechanics is that a measurement result 
is intrinsically probabilistic. Consequently, continuously monitor- 
ing a quantum system will randomly perturb its natural unitary evo- 
lution. The ability to control a quantum system in the presence of 
these fluctuations is of increasing importance in quantum informa- 
tion processing and finds application in fields ranging from nuclear 
magnetic resonance’ to chemical synthesis’. A detailed understanding 
of this stochastic evolution is essential for the development of opti- 
mized control methods. Here we reconstruct the individual quantum 
trajectories’ * of a superconducting circuit that evolves under the 
competing influences of continuous weak measurement and Rabi 
drive. By tracking individual trajectories that evolve between any 
chosen initial and final states, we can deduce the most probable path 
through quantum state space. These pre- and post-selected quantum 
trajectories also reveal the optimal detector signal in the form of a 
smooth, time-continuous function that connects the desired bound- 
ary conditions. Our investigation reveals the rich interplay between 
measurement dynamics, typically associated with wavefunction col- 
lapse, and unitary evolution of the quantum state as described by the 
Schrédinger equation. These results and the underlying theory’, based 
ona principle of least action, reveal the optimal route from initial to 
final states, and may inform new quantum control methods for state 
steering and information processing. 

Our experiment focuses on the dynamics of two quantum levels of a 
superconducting circuit (a quantum bit, or qubit), which can be conti- 
nuously measured and excited by microwave pulses. To access indivi- 
dual quantum trajectories, we make use of the fact that fully projective 
measurement (or wavefunction collapse) happens over an average time- 
scale t controlled by the interaction strength between the system and 
the detector. By recording the measurement signal with high fidelity in 
time steps much shorter than 1, we realize a continuous sequence of 
weak measurements and track the qubit state as it evolves in a single 
experimental iteration. Individual weak measurements have been recently 
used in atomic physics experiments that probe wavefunction collapse’ 
and demonstrate state stabilization®. In the domain of superconducting 
circuits, weak measurements’ have only recently been realized, owing 
to the challenge associated with high-fidelity detection of microwave sig- 
nals near the single-photon level. Advances in superconducting para- 
metric amplifiers have enabled continuous feedback control'®””, the 
observation of individual quantum trajectories'*"*, the determination 
of weak values'*'® and the entanglement of qubits'”"*. 

In previous work"’, we demonstrated the ability to track individual 
quantum trajectories using continuous quantum non-demolition weak 
measurement. To fully understand the nature of these trajectories, it is 
necessary to explore their statistical and dynamical properties. Here, by 
examining a large number of trajectories, we gain insight into the con- 
ditional dynamics of open quantum systems. We consider the subset of 
trajectories that end in a particular final state, which reveals the most 
probable path connecting two points in quantum state space. Further- 
more, whereas previous work” considered only the case of continuous 


measurement, we now introduce a concurrent drive at the qubit fre- 
quency, resulting in Rabi oscillations that turn qubit state populations 
into coherences, and vice versa. We are able to track quantum trajecto- 
ries that exhibit dynamics associated with both measurement backaction 
and unitary evolution, and find that our theoretical formalism® quan- 
titatively describes the family of trajectories that connect two points in 
quantum state space. 

Our experiment consists of a superconducting transmon circuit’? dis- 
persively coupled to a waveguide cavity” (Fig. 1a). Considering only the 
two lowest levels of the transmon as a qubit, our system is described by 
the Hamiltonian H = Hy + Hint + Hp, where 


Hint = —hya'ac; 
2 
HAR =a Oy 


and where Hp describes the qubit and cavity energy and decay terms. 
Here fi is the reduced Planck’s constant, a‘ andaare respectively the 
creation and annihilation operators for the cavity mode, and Gy anda, 
are qubit Pauli operators. The Hamiltonian Hp describes a microwave 
drive at the qubit transition frequency, which induces unitary evolu- 
tion of the qubit state characterized by the Rabi frequency 2, and Hint 
is the interaction term, characterized by the dispersive coupling rate 
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Figure 1 | Set-up. a, A transmon circuit is dispersively coupled to a three- 
dimensional copper waveguide cavity. Microwave signals that reflect off the 
cavity port are amplified by a lumped-element Josephson parametric 
amplifier” (LJPA) operating near the quantum limit. b, A microwave tone that 
probes the cavity near resonance is shown as a phasor in the X,;—X, plane, with 
zero-point quantum fluctuations shown by the shaded region. c, Ground 

and excited energy levels are shown on the transmon potential. d, The reflected 
microwave tone acquires a qubit-state-dependent phase shift that is smaller 
than the quantum fluctuations of the measurement signal. After further 
amplification, the X, quadrature of the measurement tone is digitized. e, The 
measurement is calibrated by examining the distributions of measurement 
signals for the qubit prepared in the |0) (blue) and | 1) (red) states. 
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Figure 2 | Quantum trajectories of the quantum state on the Bloch sphere 
are plotted against time. The upper panels depict the full ensemble evolution. 
The middle panels depict individual quantum trajectories (dashed curves), with 
comparison with their tomographic reconstructions (solid curves). At the 


y/2m = —0.6 MHz. This term describes a qubit-state-dependent frequency 
shift of the cavity, which we use to perform quantum state measurement 
in our system. We will work in a rotating frame to eliminate the pre- 
cession of the Bloch vector from the energy level splitting of the qubit. 
As depicted in Fig. 1b-e, a microwave tone that probes the cavity near 
its resonance frequency will acquire a qubit-state-dependent phase shift. 
If the measurement tone is very weak, quantum fluctuations of the 
electromagnetic mode fundamentally obscure this phase shift, result- 
ing in a partial or weak measurement of the qubit state. We use a near- 
quantum-limited parametric amplifier*’”’ to amplify the X, quadrature 
of the reflected signal, which is proportional to the qubit-state-dependent 
phase shift. After further amplification, we digitize the signal in 16 ns 
time steps, resulting in a measurement signal V(t). Each time step is small 
compared with the characteristic measurement time, t = «/1677711].91 Namp> 
where nis the average intracavity photon number, x/2n = 9.0 MHz is 
the cavity decay rate and 1¢oamp is the measurement quantum effi- 
ciency”’, which decomposes into separate collection and amplification 
efficiencies. The characteristic measurement time 7 is calibrated by exam- 
ining (Gaussian) histograms of the measurement results for the qubit 
prepared in the a, eigenstates |0) and |1), and is defined by the time it 
takes to separate the two distributions by two standard deviations™, 
AV = 20. 

In our experiment, we prepare the qubit in the positive eigenstate of 
the o,, Pauli operator (along the x axis of the Bloch sphere), by first making 
a projective measurement along the z axis and then a 7/2-rotation about 
the y axis'*. By considering only the instances where the measurement 
result is found to be |0), we herald the preparation of a high-fidelity 
ground state. Then a measurement tone at 6.8316 GHz continuously 
probes the cavity for a variable time t, which weakly measures the qubit 
in the a, basis. Finally, we apply further rotations and perform a pro- 
jective measurement to conduct quantum state tomography. In Fig. 2 
(top panels), we show the ensemble-averaged tomography for three dif- 
ferent Rabi drive strengths. From these curves, we extract Q2/2n = 0, 
0.56 and 1.08 MHz and the ensemble coherence decay rate J’ by com- 
parison with theory as discussed in Methods. From J, we calculate a 
total quantum efficiency 14o¢ = 1/2TT” = NeoMampNenv = 9.4, where the 
last factor indicates the (nearly negligible) extra environmental dephas- 
ING Neny = (1+ /87?AT3) ~*, with TS =15 ps. 

In each iteration of the experiment, we can use the recorded mea- 
surement signal to calculate the best estimate for the qubit state con- 
ditioned on the measurement record. As discussed in Methods, at each 
time step we apply a two-step update procedure to track the evolution 
of the system density matrix p. We account for the measurement result 
using a quantum generalization of Bayes’ rule***, and we account for the 
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bottom, we plot individual trajectories (magenta) and the ensemble averages 
(green) in the x-z plane of the Bloch sphere. a-c correspond to different values 
of the Rabi drive: Q/2m = 0 MHz (a), 0.56 MHz (b) and 1.08 MHz (c). Here 
t= 315ns and I= 3.85 X 10°s |. 


Rabi drive by applying a unitary rotation. Our finite detector efficiency 
reflects our imperfect knowledge about the state of the system and results 
in a decay of coherence given by rate y = J’— 1/2t. From the density 
matrix p, we calculate expectation values of the Pauli operators con- 
ditioned on the measurement signal: x = tr[po,], y= tr[po,] and z= 
tr[po,] (the components of the Bloch vector). 

In Fig. 2a, we display a sample trajectory with no drive (Q = 0) that 
shows the stochastic motion of the qubit state as it evolves under mea- 
surement and is ultimately projected into the |0) state. As described in 
Methods, we use conditioned quantum state tomography to reconstruct 
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Figure 3 | Greyscale histograms of quantum trajectories in the undriven 
case. Measurement duration, 1.424 us. a, b, Histograms of all measured z 

(a) and x (b) trajectories, beginning from state (x; = 0.97, z; = 0). 
Representative trajectories are shown in colour. c, d, Histograms of trajectories 
z (c) and x (d), conforming to the final chosen boundary condition, 

Zp = —0.85 + 0.03. The most likely trajectories from the experimental data are 
shown as magenta curves, with their standard deviations shown by the magenta 
bands. The most likely paths in z and x predicted from the theory are shown 
as yellow dashed curves. Other representative trajectories are shown in other 
colours. Inset in d, the most likely trajectory is plotted on the x-z plane of the 
Bloch sphere. Here t = 1.25 us and = 0.94 X 10°s 1. 
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the trajectory. Figure 2b, c demonstrates that we can track the state 
faithfully in the presence of unitary state evolution induced by a drive 
at the qubit frequency. The drive induces Rabi oscillations while the 
system is being continuously measured. The resulting dynamics is fully 
quantum, going beyond the pure measurement case”’. These trajectories 
highlight the stark difference between ensemble dynamics and the dynam- 
ics of individual quantum trajectories; whereas the ensemble average 
decays rapidly to a mixed state, the individual trajectories remain remark- 
ably pure despite the modest quantum efficiency, Not = 0.4. 

Using this ability to track individual trajectories starting from a given 
initial state, we now consider the sub-ensemble of trajectories that arrive 
at a particular final state at a given time. This sub-ensemble allows us 
to examine the conditional quantum dynamics of the state that satisfy 
two boundary conditions, one in the past (‘pre-selection’) and one in 
the future (‘post-selection’). This is similar to an analysis that leads to 
‘weak values’*-*”, and time-continuous generalizations'®”* that con- 
sider an additional projective post-selection measurement. In contrast 
to that approach, we use only a solitary continuous measurement: the 
pre-selection is just the initial state, and the post-selection is simply what- 
ever the state is when the detector stops measuring. The resulting aver- 
age of the measurement output gives a ‘weak function’ that connects the 
boundary conditions. 

To investigate the full ensemble and post-selected sub-ensemble dynam- 
ics, we perform 10° iterations of the experiment with a measurement 
duration of 1.424 us. For each experiment, we construct the quantum 
state trajectory by finding x and z for every time step. Figure 3 displays 
the measurement dynamics for 2 = 0. We consider the sub-ensemble 


a 1.0 


0.5 


ob 


of trajectories that have final values (z(1.424 Ls), x(1.424 us) within 0.03 
of (xp, Zp) = (—0.85, 0.23). This analysis allows us to examine proper- 
ties of the conditional trajectories such as the most likely path that con- 
nects pre- and post-selected states. 

The most likely paths can be theoretically calculated using a stochastic 
path integral representation of the joint probability of the measurement 
outcomes at every point in time with boundary condition constraints. 
The conditional detector backaction on the quantum state can be imposed 
at every time step with Lagrange multipliers (p,, p,) as auxiliary dynam- 
ical parameters. Finding the extremum of the stochastic action leads to 
equations of motion for the optimal path connecting the boundary con- 
ditions. As we discuss in Methods, this corresponds to optimizing the 
total path probability between the states. Because the experiment oper- 
ates in the x-z plane of the Bloch sphere, the (deterministic) equations 
of motion for the optimized path are 


x= —yxt+Qz—xzr/t 
z= —Qx+(1—2)r/t 
Px= +7Px t+ Qpet pxzr/t 


1)r/t 


p= Qpx t (pxx t 2pzZ 


where x, Z, Px» pz and r are now functions of time (with a dot denoting 
a time derivative) and r= z+ p.(1 — 2°) — p,xz. Here r is the optimal 
readout and relates to the optimal detector signal as follows: Voy = AV1/2. 
This rescaling makes r an estimation of z without post-selection (that 
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Figure 4 | Greyscale histograms of quantum trajectories in the driven case. 
The measurements begin at state (x; = 0.88, z; = 0). Here t = 315ns, 
T'=3.85 X 10°s_!, Q/2m = 1.08 MHz. a,b, Histograms for z (a) and x (b) with 
representative trajectories plotted in colour and with the average trajectory 
shown in black. In the other panels in a and b, we post-select on the final state 
(Zp = 0.7, xp = —0.29), with a post-selection window of £0.08. Solid magenta 
curves are the most likely trajectories for the experimental data, and the 
yellow dashed curves are from the theory. The standard deviations of the 
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experimentally determined most likely paths are shown by magenta bands. 
As the time duration between the boundary conditions is increased from 

t, = 0.464 Us to t, = 0.944 1s and then to t; = 1.424 1s, the most likely 
trajectory connecting the initial and final states changes drastically but is well 
described by the theory (dashed line). The bottom panels compare the optimal 
detector signals (r; dashed lines) with the conditioned average signal (weak 
functions; black lines). 
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is, Pp, = pz = 0). The solution to these nonlinear equations admits four 
constants of motion, which permits the imposition of both initial (x;, z;) 
and final (x,, Zz) boundary conditions. 

The equations have a simple analytic solution (x, Z) for 2 =0. We 
consider measurement for a time T, starting in the initial state (x; = 1, 
Z = 0) and ending in a state (x,, Zp) (in this particular case, x; is deter- 
mined by the choice of zg). The solution of equations (1)-(4) is (x(t) = 
e~” sech(7t/t), Z(t) =tanh(rt/t)), where F=(t/T)tanh~ '(zp) is the 
detector output of maximum likelihood. These solutions are plotted in 
Fig. 3, showing agreement with the experimentally obtained most likely 
path (Methods). The most likely times between different boundary con- 
ditions are shown in Methods and Extended Data. 

In Fig. 4, we display the full ensembles and post-selected ensembles 
for the driven case (Q/2n = 1.08 MHz). Depending on the amount of 
time between the initial and final states, the competition between mea- 
surement and Schrédinger dynamics produces different (and non-trivial) 
optimal routes, alternatively showing diffusive Rabi oscillation dynam- 
ics and quantum jump dynamics***”>”? (where the system is effectively 
pinned in one of the eigenstates). We compare the experimentally deter- 
mined most likely trajectories (Methods) with the most likely paths 
obtained from solving equations (1)-(4). The equations were numerically 
solved with a shooting method to satisfy both initial and final boundary 
conditions at different times. These numerical solutions show reasonable 
agreement with the experimentally determined most likely curves. 

In addition to the quantum paths, the solution of equations (1)-(4) 
also gives the optimal detector response for moving the quantum system 
to the target state after a given time. We compare these optimal signals to 
the conditioned average detector signals (weak functions) in Fig. 4. The 
post-selection allows the conditioned average detector signal r to exceed 
the usual range of [—1, 1] for z. This behaviour is analogous to that of 
weak values, which can also lie outside their eigenvalue range”’. 

The ability to find and verify the most likely path between chosen 
initial and final quantum states under continuous measurement extends 
our fundamental understanding of quantum measurement and advances 
the field of quantum control of individual systems. Our results give deep 
insight into the quantum dynamics and associated measurement read- 
out, as revealed by the ability to condition on the final quantum state in 
the presence of a continuous coherent Rabi drive. The data presented 
here are in good agreement with our stochastic path integral formalism, 
predicting the global most likely path, and open the way to solving related 
optimization problems important to controlling a quantum system. Exam- 
ples of future applications of this approach, in the specific area of super- 
conducting qubits, include using the continuous measurement results 
for improved state preparation, state estimation and Hamiltonian para- 
meter estimation. Multiple-qubit architectures can also be fabricated, 
with each qubit having its own measurement device, enabling optimal 
continuous control protocols for an ensemble of superconducting qubits 
with real-time state monitoring. Furthermore, the present work can be 
extended to solve more general optimization problems in quantum mech- 
anics, such as finding the most likely path from a separable state to a 
desired entangled state. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Device parameters. The qubit consists of two aluminium paddles connected by a 
double-angle-evaporated aluminium SQUID deposited on double-side-polished 
silicon. The qubit is characterized by a charging energy E./h = 200 MHz and a 
Josephson energy E,/h = 11 GHz. The qubit is operated with negligible flux thread- 
ing the SQUID loop with a transition frequency «,/2n = 4.01057 GHz. The qubit 
is located off centre of a 6.8316 GHz copper waveguide cavity. With the measure- 
ment tone on, the qubit transition frequency was a.c.-Stark shifted to 4.00748 GHz. 
Qubit pulses and drive are performed at the a.c.-Stark-shifted frequency. 

The lumped-element Josephson parametric amplifier (LJPA) consists of a two- 
junction SQUID, formed from 2 1A Josephson junctions shunted by 3 pF of capac- 
itance, and is flux biased to provide 20 dB of gain at the cavity resonance frequency. 
The LJPA is pumped by two sidebands equally spaced 300 MHz above and below 
the cavity resonance. 

Experimental set-up. Extended Data Fig. 1 displays a schematic of the experimental 
set-up. Experimental sequences start with an 800 ns readout to herald the |0) state 
(z= +1), followed by a 16 ns 1/2-rotation about the y axis to prepare the qubit 
along the x axis. The state preparation fidelity is 88% for the data shown in Figs 2 
and 4, and is 97% for the data shown in Fig. 3. After a period of variable duration, 
we perform quantum state tomography by applying either rotations about the x and 
y axes, or no rotation followed bya second 800 ns readout. Tomography results were 
corrected for the readout fidelity of 95%. 

Calibration of the measurement. We calibrate the characteristic measurement 
time t by examining histograms of the measurement signal for the qubit prepared 
in either the |0) or the |1) state. We prepare these states through a herald readout 
and then digitize the measurement signal for a variable period of time. The result- 
ing distributions are approximately Gaussian: 


f 1 -iv-av2) 
P(V|0) = Snot 20 

/ 1 -(v+av/2) 
P(V|1) = Saat Qe 


We fit the distributions to determine AV, the voltage separation of the peaks and 
the variance o”. The quantity S = AV*/o” increases linearly with integration time: 
S = 4t/t. We fit this relationship to determine the characteristic measurement time t. 

To calibrate the initial state and the total dephasing rate, we prepare the qubit 
along the x axis and perform quantum state tomography after a variable period of 
time. The tomography results for the full ensemble are shown in Fig. 2a, and exhibit 
exponential decay of coherence at rate J. The total quantum measurement efficiency 
is given by tor = 1/2/'t. Note that the total quantum measurement efficiency iro = 
NcolMampMenv is the product of the efficiencies for collection, for amplification and 
from extra environmental dephasing. We use the tomography value at t = 0 to 
determine the initial state, denoted (xo, Zo). 

To determine the Rabi frequency, Q, we examine the ensemble tomography results 
as shown in Fig. 2b, c. The ensemble evolution is given by the Lindblad equation 
with arbitrary Rabi drive: x(t) = —Ix(t)+Qz(t), z(t) = —Qx(t). With initial 
state (xo, Zo), these equations have an analytic solution 


x(t)=e 1"? (x cos(At) — Tee snl dt)) 
2(t)=e 1? (« cos(At) + fa 20s sn(2t)) (5) 
A 


where A = ,/ Q? — (I’/2). We use equation (5) to determine the Rabi frequency Q 


for each measurement strength and Rabi drive amplitude. 

Propagation of the qubit-state density matrix. Given the Rabi frequency Q, the 
coherence decay rate y and the initial qubit state calculated from the values of xo 
and Zp at time t= 0, we propagate the initial state to states at later time steps 
t= dt, 2dt, ..., ndt using a two-step procedure. At any time t, we first apply a unitary 
rotation to account for the Rabi drive 


Q 

Por =Por+ 3 (Poo — Pus) dt (6) 
Q 

Pu =Put z (01 + Pio) dt (7) 


where /9; Por P10 and p;; are matrix elements of a qubit density matrix p(t). With 
the input values pj, and p},, we next apply the Bayesian update to them based on 
the measurement result obtained in the time interval between t and t + df, and get 


(01, / Pho) exp( —4V(t)dt/tAV) 
T+ (Pha / phe exp(—4V ()dt/tAV) 


pu(t+dt) 


(8) 


VO =pu(t+dt)) pn (t+di) ent 
(—pii)Pi 
We use dt = 16 ns as the data sampling interval, and V(t) is the measurement result 
obtained between t and ¢t + dt. As discussed in the main text, we validate the state 
update procedure using conditioned quantum state tomography and find good agree- 
ment between individual trajectories and the tomographic reconstructions. 
Moreover, in the time-continuum limit dt—> 0, we can approximate the state 
update procedure (equations (6)-(9)) with the differential equations 


x(t) = —yx(t) + Qz(t) —x(t)z(t)r(t)/t 


Poilt +dt) = po, (9) 


(10) 
z(t) = —Qx(t) + (1—2(t))r(8)/t (11) 


where r(t) = 2V(f)/AV is the dimensionless measurement signal, and x(t) = 
tr[o,,p(t)] and z(t) = tr[o,p(t)] are the Bloch vector coordinates as functions of 
time. 

Tomographic validation. To verify that we have accurately tracked the quantum 
state of the system, we perform quantum state tomography at discrete times along 
the trajectory. We denote the target trajectory, which is based ona single run of the 
experiment, (X(t), Z(t)) For each experimental sequence of total measurement 
duration t, we propagate p and, if x(t) =x(t) +0.03 and z(t) =2Z(t) + 0.03, then 
the subsequent tomography results are included in the tomographic reconstruc- 
tion of the state at time ¢. We repeat this analysis for all time steps between 0 and 
1.6 1s, and find good agreement between the individual trajectories and the tomo- 
graphic reconstructions. 

Histogram scaling. The greyscale histograms shown in Figs 3 and 4 represent the 
values of x and z at each time point, binning with the bin size of 0.02. The greyscale 
shading is normalized such that the most frequent value is 1 at each time point. 
Derivation of the ordinary differential equations in equations (1)-(4). We 
consider a set of unitless measurement readouts {rz} = {7o, 1, ..., 7n—1}, where 
1, = 2V;,/AV, at times {t, = kdt} for k = 0,1,...,2 — 1, and its corresponding set 
of qubit states, denoted by {q,}. In our experiment, the ycomponent of the qubit 
Bloch coordinates is always zero, and q, is thus a two-dimensional vector: qx = 
(xp Z,). We write a joint probability density function of all measurement outcomes 
{rx}, the quantum states {q,} and the chosen final state qr, conditioned on the initial 
state q; as 


P({qe},{re}.4e141) = 5° (qo — 41)" (Qn — 48) 


x (Trt. 1 la)PCrls) 


Here P(qx+1|4x 1) is a probability density function of a qubit state at time K+) 
given a qubit state and measurement signal at previous time t,. Because a qubit state 
at any time ¢,4 is updated deterministically from q, and 7%, the density function 
P(qk+1|qio tk) is a delta function whose argument imposes the state update equa- 
tions. The conditional distribution of the detector output P(r;|q,) obtained in a 
time interval dt is a probability density function of r, given qx: 


dt (1+z di, 12 , 1-z ate, 2 
Ptnla)= yt (Se arte —1) ae a Fete +1) 


By expressing the delta functions in equation (12) in Fourier-transformed forms 
with conjugate variables p; = (pj.,p) for k = —1,0, ...,n and other terms in expo- 
nential form, we can write the joint probability density function in a path integral 
representation P({qx}.{rx},qr|qr) oc [ Dp e°. Here Dp is an integral measure over 
conjugate variables {p;}, and S is an action given by 


(12) 


S=—p_.*(qo—41) —Pn*(Gn— 98) 


(13) 


+ —pee(aus1 —Elaesral) +1nPCrelau)} 
k=0 


vi 
B+ | at 


0 


PxX— pez +px(—yx+Qz—xzr/t) 


+pz(—Qx+(1—z’)r/t) a) 


= (7? —2re+ 1) /2z] 


where we have used the operator E[q;,r;] to indicate the state update, and B as a 
short-hand for the first two terms in equation (13). We note that, in equation (14), 
we have taken the time-continuum limit dt— 0 and written the action explicitly 
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for our qubit measurement case with the state update equations (10) and (11). We 
have also used shortened notation for the variables, for example x = x(t) = limar+o 
{Xo, X1,...,X»}. To obtain the most likely path, we then extremize the action in 
equation (14) over all variables (x, z, P,, pz, rf) and obtain the ordinary differential 
equations (ODEs) shown in equations (1)-(4): 


(15) 
(16) 
(17) 
(18) 


Here r= z+ p(1— L)- pxxz and the forced boundary conditions are x(t = 0) 
xp, 2(t = 0) = q, x(t = T) = xp, z(t = T) = Zp. As discussed in the main text, we 
can analytically solve the ODEs in equations (15)-(18) when Q = 0. For the driven 
case, where Q # 0, we solve the equations numerically using a shooting method. 
Interpretation of the solutions of the ODEs. Here we discuss the interpretation 
of the solution of the ODEs in equations (15)-(18) (and equations (1)-(4)). The 
extremization of the action in equation (13) can also be interpreted as a constrained 
optimization of the last term of equation (13), )y7—4 In P(1x|qx), which is the log- 
likelihood of the trajectory. The constraints are as follows: (1) the qubit state updates 
qr+1=E|qe.tk| for k=0,1,...,2 — 1; (2) the pre-selected state is qo = qy and 
(3) the post-selected state is q,, = qr. The conjugate variables {p,} now act as the 
Lagrange multipliers of the constrained optimization. With this interpretation, a 
solution of the ODEs in equations (15)-(18) therefore represents a path with an 
optimized value of 77~) InP(r¢|qx) or its exponential IT"=}P(rg|qx), that is, a 
measurement path probability density. 
The most likely path. The optimized path mentioned in the previous subsection 
can represent either a maximum, a minimum or a saddle point of the path proba- 
bility under the constraints. We can determine which by finding paths slightly varied 
from the optimized solution, with all constraints still applied. This can be done by 
adding small constants 6 and 6, to the right-hand sides of the differential equations 
of the conjugate variables p, and p, (equations (17) and (18)), leaving the equations 
for x and z unchanged, and solving the whole system with the same boundary 
conditions. Solutions of the modified ODEs will be slightly varied from the opti- 
mized path. We then compute their full-path probabilities, comparing with the 
probability calculated from the optimized path. In Extended Data Fig. 2, we show 
samples of paths from the variational method described here and the unnorma- 
lized full-path probability of the surrounding paths. In this case, it shows that the 
optimized solution is the most likely path, with a maximum value of the path prob- 
ability density. 
The most likely paths from the experimental post-selected trajectories. To find 
the most likely path from experimental trajectory data, we first define the closeness 
of any two trajectories (named a and b) as a time-average of the Euclidean distance: 


k= —yx+Qz—xz1/t 
z= —Qx+(1—2’)r/t 
Px=t+ypx+Qpz+pxz r/t 


Pe = —Qpx + (Pex +2p22—1)r/t 


n—1 


D=(1/n) S~ (xl te) —0(te))” + alte) —Z0(th))? 
k=0 


where ft; = kdt and dt is the time step. For a qubit, the Euclidean distance is the 
trace distance (up to a factor of 2) between two quantum states p; and p>, given by 
(1/2)tr[p, — p2]. We compute the distance D between all possible pairs of traject- 
ories starting from the same initial state q; = (xj, z;) and ending around a final state 
qr = (Xp Zp) with a small tolerance. We then search for N trajectories that have 
minimum average distance to all others, and average them to obtain an estimate of 
the most likely path. The number of trajectories N is chosen to be about 10% of the 
total number of trajectories in the sub-ensemble. (Nis of order 10”, compared with 
10° trajectories in the sub-ensemble). As a result, we get a smooth estimate of the 
most likely path, which is still very different from the total (sub-ensemble) aver- 
aged trajectory. We plot the standard deviation of the data of x and z for the chosen 
10% of trajectories for every time step as a shaded band in Figs 3 and 4. As shown in 
those figures, the experimentally determined most likely paths closely approximate 
the theoretical most likely paths, that is, the solutions of the ODEs in equations 
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(15)-(18). We expect that the approximated curves converge to the smooth theory 
curves in the limit of an infinite ensemble of post-selected trajectories. 

In some cases, we can simply look at a trajectory of local medians (medians of x 

or Zat all time steps) and compare it with the theoretical most likely path. The median 
trajectory can in practice be a good approximation to the theory curve when the 
distribution of the post-selected trajectories is a narrow band, that is, when the post- 
selected trajectories lie closely around a single path. As an example, in the case 
where there is no drive on the qubit, 2 = 0, we show in our theory paper* that the 
median curves agree quite well with the most likely curves. However, in the driven 
case where the qubit trajectories can possibly have different winding numbers 
around the y axis, resulting in multiple most likely paths from an initial state to a 
final state, simply finding the median of the distribution of x or z is not enough to 
capture their most likely behaviour. In this paper, we focus only on the cases where 
there is a single most likely path between any two boundary states. We will discuss 
our findings concerning the multiple paths connecting two boundary states in a 
future work. 
The most likely time. Apart from the path of maximum likelihood taken between 
the pre- and post-selected states in a fixed time, a complementary problem in quan- 
tum control is that of the optimal waiting time between starting and destination 
states. In the case where there is no Rabi drive on the qubit, Q = 0, we can fix the 
states at the endpoints and inquire about the most likely time taken to travel between 
them. While a path integral derivation of the most likely time is possible, we give a 
simpler derivation here based on the probability distribution of the time-average 
measurement readout V=(1/n) )v%_9 Ve. 

In the case with no drive on the qubit, the z coordinate of the qubit on the Bloch 
sphere at any time T is solely determined by V. We can derive the distribution of 
the final z coordinate (zg) at any time T, given the initial z coordinate (z;), P(zp|Z), 
from the probability density function P(V|z,). The probability density function of 
V, given 2}, is 


14+2z 1-Z 
2 2 


1 - ae 2 
= / 1 14% 3 rv-aviay , I a 5(V +Av/2) 
2no?\ 2 2 


where the variance of the voltage signal measured in a time df is 7 = AV77/4dt. 
We change variable from the time-averaged measurement signal V to the final 
zcomponent zy as follows: V=(tAV/2T) |[tanh™ ' (zp) —tanh~*(z;)]. We obtain 
the differential measure dV =(t/2T) [AV /(1—zj)]dzp. The probability density 
function of zp given z can be computed via the relation P(V|z1)dV = P(zp|z1)dzp: 
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where r= tanh? € 4) : (tanh *(zp) — tanh *(z;)). For the case where 


the initial state is x = +1 (z = 0), the probability density function simplifies to 
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We then compute the most likely time T,,,, where the probability density function 
P(zp|z1) is maximized for the fixed values of z; and zp. By maximizing the prob- 
ability function P(z;|z;) with respect to T, we obtain 


Zp Z] 
1— ZiZE 
P(z,|Z, = 0) asa function of time T for z; = 0.2, 0.4, 0.6. They show very good agree- 
ment with the experimental data. 


where j= tanh! ( ) . We show in Extended Data Fig. 3 the distributions 
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Extended Data Figure 1 | Experimental schematic. The weak measurement 
tone is always on. The projective readout tone is pulsed. The amplitude and 
phase of the signal displacement tone are adjusted to displace the measurement 


in the linear regime. 
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signals back to the origin of the X,—X, plane, which allows the LJPA to perform 
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Extended Data Figure 2 | Paths slightly varied from the optimal solution. 
a, Overplotted x and z coordinates of 11 trajectories slightly varied from 

an optimized solution with boundary conditions (x; z;) = (0.88, 0), 

(xp Zp, Te) = (—0.683, — 0.227, 0.464 1s) and the Rabi drive Q/2n = 1.08 MHz. 
b, The corresponding conjugate variables p, and p;. ¢, d, Plots of the 
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unnormalized probability versus changes of the constant 6, in the p, differential 
equation (c) and the unnormalized probability versus changes of the constant 
62 in the p, differential equation (d). In this case, the optimized solution 
gives a maximum value of the path probability density. 
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Extended Data Figure 3 | Optimal time between starting and destination 
states. The probability density functions P(zp|z = 0) plotted as functions of 
time T (solid curves) along with experimental data (dotted curves) with 

t = 1.25 ps. The red, green and blue curves are the distribution functions 
P(zp = 0.2|z; = 0), P(zp = 0.4|z, = 0) and P(zp = 0.6|z; = 0), respectively. The 
optimized times T.,, for the three cases are shown as the vertical black dashed 
lines with the labels Tp 5, Ty.4 and To 6. 
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Extended Data Figure 4 | Greyscale histograms of ensemble and t = 315ns with (xg, Zp) = {(—0.69, —0.5), (0.5, —0.5), (—0.73, —0.5)}. 


post-selected trajectories for different Rabi frequencies and measurement _c, Trajectories for Q/2m = 0.58 MHz and t = 315 ns with 

strengths. a, Ensemble and post-selected trajectories for Q/2m = 1.08 MHz (xp, Zp) = {(—0.35, —0.5), (—0.5, —0.5), (—0.56, —0.5)}. Note that 

and t = 1.25 pts. The post-selections for times {t, = 464ns, tf, = 944ns, all the trajectories use the same value of zg. The values of xp were chosen 
tz = 1.424 us} are (xp, Zp) = {(—0.78, —0.5), (0.7, —0.5), (—0.73, —0.5)} with a _ to give a large number of trajectories in the post-selected ensemble. 

post selection window of £0.08. b, Trajectories for Q/2m = 1.08 MHz and 
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Antarctic glaciation caused ocean circulation 
changes at the Eocene- Oligocene transition 


A. Goldner’, N. Herold? & M. Huber?# 


Two main hypotheses compete to explain global cooling and the abrupt 
growth of the Antarctic ice sheet across the Eocene-Oligocene trans- 
ition about 34 million years ago: thermal isolation of Antarctica due 
to southern ocean gateway opening’ ~“, and declining atmospheric CO, 
(refs 5, 6). Increases in ocean thermal stratification and circulation in 
proxies across the Eocene-Oligocene transition have been interpreted 
as a unique signature of gateway opening”’, but at present both mecha- 
nisms remain possible. Here, using a coupled ocean-atmosphere model, 
we show that the rise of Antarctic glaciation, rather than altered pa- 
laeogeography, is best able to explain the observed oceanographic 
changes. We find that growth of the Antarctic ice sheet caused en- 
hanced northward transport of Antarctic intermediate water and 
invigorated the formation of Antarctic bottom water, fundament- 
ally reorganizing ocean circulation. Conversely, gateway openings 
had much less impact on ocean thermal stratification and circula- 
tion. Our results support available evidence that CO, drawdown— 
not gateway opening—caused Antarctic ice sheet growth, and fur- 
ther show that these feedbacks in turn altered ocean circulation. The 
precise timing and rate of glaciation, and thus its impacts on ocean 
circulation, reflect the balance between potentially positive feed- 
backs (increases in sea ice extent and enhanced primary productiv- 
ity) and negative feedbacks (stronger southward heat transport and 
localized high-latitude warming). The Antarctic ice sheet had a 
complex, dynamic role in ocean circulation and heat fluxes during 
its initiation, and these processes are likely to operate in the future. 

Two main conceptual models have been advanced to explain the 
pattern of cooling’, enhanced Antarctic circumpolar circulation’, bio- 
spheric transitions® and glaciation across the Eocene-Oligocene trans- 
ition (EOT) (~34.1-33.6 Myr ago). These are Southern Ocean gateway 
opening and CO, drawdown. The first model, supported by proxy inter- 
pretations and numerical ocean modelling, posits that opening of Sou- 
thern Ocean gateways caused a fundamental reorganization of ocean 
structure and circulation’, cooled the Antarctic region’ and enhanced 
ocean heat transport into the North Atlantic”. The second model is that 
CO, drawdown and its radiative forcing’ led to global cooling, enhanced 
near the poles, which initiated Antarctic glaciation. This conceptual model 
is supported by clear evidence of drawdown across the EOT*”, as well as 
by modelling of ice sheets*’” and of the coupled ocean-atmosphere'**. 

Although both conceptual models are supported by data and numerical 
models, both hypotheses have deficiencies. The gateway model explains 
many features of the ocean structure and circulation but has three main 
weaknesses. First, the timing of gateway opening is poorly constrained'* 
and does not seem to match the formation of the Antarctic ice sheet 
(AIS). Second, gateway opening is too gradual to explain rapid changes 
easily’®, although nonlinear, thresholded responses cannot be ruled out. 
Third, most modelling suggests that open southern high-latitude gate- 
ways would have little impact on AIS formation*”. 

The CO, drawdown argument also has weaknesses. First, it does not 
explain the ocean thermal stratification in benthic 5'°O records as the 
gateway mechanism does”*. Second, the magnitude of the decrease 


in CO, required to match reconstructed cooling in some previous 
fully coupled EOT modelling’ is at least twice as large as CO. proxies 
indicate. 
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Figure 1 | Atlantic Ocean 5'°O records compiled for the EOT. a, Deep Sea 
Drilling Project (DSDP) and Ocean Drilling Program (ODP) site locations for °O 
records (palaeorecords described in Extended Data Table 1) overlaid on late Eocene 
bathymetry. b, Positive values indicate an increase in 5'°O going from the late 
Eocene (34-36 Myr ago) into the early Oligocene (31-29 Myr ago) interpolated 
across the Atlantic Ocean basin (model longitudes 315-359°). Raw 5'8O anomalies 
are plotted as the inset in b (same values as in Extended Data Fig. 9a). 


1Department of Earth, Atmospheric, and Planetary Sciences, Purdue University, West Lafayette, Indiana 47907, USA. *American Geophysical Union, Washington DC 20009, USA. ?Department of Earth 
Sciences, University of New Hampshire, Durham, New Hampshire 03824, USA. “Earth Systems Research Center, Institute for Earth, Ocean and Space Sciences, University of New Hampshire, Durham, New 


Hampshire 03824, USA. 


574 | NATURE | VOL 511 | 31 JULY 2014 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


Figure 2 | Ocean temperature and 
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Recent work extended the CO, drawdown hypothesis by suggesting 
that AIS formation, caused by a decrease in greenhouse gases, was respon- 
sible for strengthening the ocean’s overturning circulation through 
enhanced sea-ice growth and brine rejection’’. This is conjectured to 
have invigorated the formation rates of Antarctic bottom water, explain- 
ing changes in the oceans thermal gradient detected across the EOT. No 
coupled ocean-atmosphere modelling studies have been performed to test 
the hypothesis that growth of the AIS during the EOT could explain recon- 
structed changes in ocean temperatures, 5'*O gradients and circulation. 

In this study we show that inclusion of this simple feedback—AIS 
formation—is enough to bring the CO, drawdown hypothesis into bet- 
ter agreement with both enhanced gradients noted in ocean 8'*O records 
and observed cooling at the correct CO, (see Methods and Extended 
Data Fig. 1). First we review this isotopic signature and then use an Earth 
system model to show that these results can be reconciled. 

The pattern of global surface cooling is now well established from many 
independent proxy records”"*®, but cooling inferred from proxies is not 
homogeneous. Annual mean temperature decreases weakly across the 
EOT in northern high latitudes and in the subtropics’*, whereas cooling 
is strong in the high southern latitudes’. This information has been used 
in conjunction with benthic 5'°O records” to estimate that the EOT is 
associated with a deep ocean cooling by several degrees'*” and the growth 
of an AIS to 50-100% of the modern volume’. This inhomogeneous sig- 
nature has been proposed to be uniquely explained by Southern Ocean 
gateway opening”*”. 

To illustrate this signature, we recompile existing Atlantic Ocean ben- 
thic foraminiferal 5'°O records (Fig. 1a and Extended Data Table 1 and 
Extended Data Fig. 9a) and use natural neighbour regridding to generate 
a south-north transect along the Atlantic Ocean basin (Fig. 1b). As has 
been pointed out previously*’, isotope changes across the EOT are con- 
sistent with abyssal cooling, but the data support weak warming or neg- 
ligible cooling in the upper subtropical ocean; that is, the vertical and 
meridional gradients in 5'°O and presumably temperature were enhanced 
across the EOT. Opening of Southern Ocean gateways, which initiated 
the Drake Passage (DP) effect’, is one potential cause of this signature, 
whereas CO, drawdown by itself does not reproduce this pattern’. Here 
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we test the hypothesis that growth of the AIS, a feedback ignored in pre- 
vious work, might produce this signature as well. 

Weemploy a fully coupled atmosphere, sea-ice, land and ocean model 
with representative EOT boundary conditions to quantify the changes 
in ocean temperature, salinity and circulation that occur as a result of 
adding the AIS into the unglaciated late Eocene world. We perform a 
series of simulations using the National Center for Atmospheric Research 
Community Earth System Model (CESM1.0). The unglaciated EOT simu- 
lations employ topography, vegetation, modern orbital variations, bathy- 
metry and ocean gateways described in previous work”. For the glaciated 
EOT simulations, we increase topography and albedo over Antarctica to 
nearly modern-day levels (~75% of modern Antarctic ice volume). 

The simulations were integrated for 3,500 model years, beginning from 
the end of previously described equilibrated simulations with an earlier 
model version. Equilibration was verified after calculating near-zero sur- 
face and deep-ocean trends in temperature and ideal-age tracers (see 
Methods). In the first experiment we add the AIS while holding CO, 
constant at 1,120 p.p.m. In the second experiment CO, is decreased from 
1,120 to 560 p.p.m. while keeping Antarctica unglaciated. In the third 
experiment, we combine AIS and CO, perturbations to explore the com- 
bined impact on the ocean’s structure. Finally, we explore the impact of 
the DP and the Tasman Gateway (TG) by opening and closing both high- 
latitude gateways, holding everything else fixed. 

First, we compare the glaciated and unglaciated EOT simulations at 
constant CO, to understand the response to glaciation alone. We find 
that AIS growth engenders a complex response of regional warming and 
cooling consistent with a previous related study in which ocean heat 
transport was specified”. Subtle warming occurs at the surface in the 
Southern Indian Ocean and within subtropical gyres, but southern high 
latitudes cool, especially in the Atlantic, and this is communicated into 
the abyss (Fig. 2a, cand Extended Data Fig. 1) Glaciation shifts the ocean’s 
meridional and vertical thermal structure as the deep ocean cools more in 
the Southern Hemisphere than in the Northern Hemisphere, and cooling 
above the thermocline is negligible whereas cooling below it is strong. This 
glaciation-induced cooling is similar to that predicted from opening South- 
ern Ocean gateways in previous work that did not use fully coupled 
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Figure 3 | Diagnostics for the 
mechanism for deep ocean cooling 
due to glaciation. a, Anomalous 
salinity (colour contour) and surface 
salt flux (vectors). b, Sea-level 
pressure anomaly (colour contour) 
and mean surface wind anomaly 
(vectors). c, Anomalous upwelling 
and downwelling (colour contour) 
and Ekman transport (vectors). 

d, Zonally averaged salinity (colour 
contour) and ocean current 
anomalies in Atlantic Ocean 
(vectors). (See Extended Data 

Figs 4-7) for absolute fields, 
including sea level pressure, surface 
salt flux, Ekman transport 

and ocean velocities.) 
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models”*. With a coupled model we simulate much less cooling due to 
Southern Ocean gateway opening (Fig. 2b). The pattern of cooling induced 
by glaciation has interesting implications for interpreting '°O records. 

To facilitate comparison with 5'°O records we use an empirical cal- 
ibration between temperature and salinity to express ocean model results 
as an equivalent 880 change (see Methods). Comparison between the 
model and data reveals a good match (Fig. 2d) in terms of the change in 
5'8O gradient. Specifically, there is the positive isotopic shift in 8'°O in 
the southern high latitudes”, and a relatively small change in 8'%O in the 
Northern Hemisphere. The 5'°O anomaly induced by opening ocean 
gateways does not produce a positive isotopic shift in the southern high 
latitudes (Extended Data Fig. 2). This is important because previous 
studies link the positive isotopic 5'°O excursion in Southern Hemi- 
sphere near the EOT to opening of the DP, and here we show that AIS 
growth is a different mechanism for shifting the ocean’s 5'°O gradient. 

The change in the meridional temperature gradient and deep ocean 
cooling due to glaciation is associated with circulation shifts in the atmo- 
sphere and ocean. Introduction of the AIS lowers sea surface tempera- 
tures (SST's; Extended Data Fig. 3) and enhances the growth of sea ice 
along the Antarctic margin (Extended Data Fig. 3). These cooler Antarctic 
coastal waters are also saltier because an enhanced salt flux towards 
Antarctica within the Ekman layer (Fig. 3a and Extended Data Fig. 4) 
weakens the pycnocline. These changes occur because a shift in regio- 
nal atmospheric circulation is induced by Antarctic glacial topography 
(Fig. 3b), which intensifies the pressure gradient between Antarctica and 
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the surrounding ocean. The change in pressure gradient drives increases 
in surface wind around Antarctica (Fig. 3b and Extended Data Fig. 5), 
enhancing Ekman transport towards Antarctica (Fig. 3c and Extended 
Data Fig. 6). Surface currents then subduct along the Antarctic coastline 
(Fig. 3d and Extended Data Fig. 7), moving the cooler, denser water mass 
into the abyss. Downwelling of high-salinity water increases at 60° S below 
the mixed layer, which contributes to enhanced deep-water formation. 
These changes in oceanic and atmospheric circulation in the southern 
high latitudes drive an enhanced meridional overturning circulation 
(Extended Data Fig. 8). Our simulated change in ocean circulation is 
consistent with increased northward transport of detected proxies of 
Antarctic intermediate water (Fig. 3d)*. Previous modelling studies also 
found increases in Ekman divergence and increased Atlantic deep water 
formation due to the opening of the DP”, although we find comparable 
shifts in circulation and ocean temperatures without changing gateways 
or CO). These simulations reproduce many trends in ocean temperature 
and circulation produced in previous studies driven by CO, changes 
alone’, but inclusion of AIS formation as a feedback enables these shifts 
to occur with more realistic changes in CO. 

The changed ocean circulation and structure increase ocean heat trans- 
port by ~0.3 PW (Fig. 4) or ~60% of the value at 45-60° S. Previous work 
found that a 20% increase in ocean heat transport could significantly affect 
the timing of glacial inception and AIS stability’. Thus, glaciation induces 
a negative feedback by increasing southward ocean heat transport and 
heat fluxes along the Antarctic margin, potentially capable of significant 
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Figure 4 | Zonally averaged poleward ocean heat transport. Shown are 
unglaciated (orange line) and glaciated late Eocene (dark blue line) CO) specified 
at 1,120 p.p.m.; unglaciated (light orange line) and glaciated cases (light blue line) 
at 560 p.p.m. CO,; anomaly (left y axis) due to glaciation at 1,120 p.p.m. CO, 
(dashed grey line) and 560 p.p.m. CO) (dashed black line); and anomaly (right 
yaxis) due to gateway opening at 1,120 p.p.m. CO) (dashed green line). 


feedback onto the AIS (Fig. 4). Such changes would also have significant 
impacts on other feedbacks, for example those due to sea ice'***. This 
negative feedback may help to explain the stepwise growth of the AIS’ and 
its early history of instability’®. AIS formation is a leading-order feedback 
that should be considered in such studies. These results have implications 
for estimates of future changes in ice sheets and sea level, because the 
results demonstrate that ice sheets modify ocean stratification, casting 
doubt on the assumptions used in some studies”. Our experiments open- 
ing the TG and DP gateways show weak changes in ocean heat transport 
(Fig. 4), in agreement with previous work in coupled models”. 

Consequently, although debate continues on whether the DP and TG 
opened shallowly in the Eocene or not'*”’, deep opening was probably 
not completed until the Miocene*”’. Our simulations show little sens- 
itivity to gateway opening and illustrate that it is unlikely that gateway 
opening is well suited to explain the rapid, large changes during the EOT 
(Fig. 2b and Extended Data Fig. 9). We have shown that the AIS—ocean 
feedback mechanism parsimoniously explains existing data and is con- 
sistent with reconstructed timing of events (Fig. 2). Our modelling results 
support the hypothesis that a decrease in CO, cooled the world and 
initiated AIS growth, which in turn increased the ocean’s thermal gra- 
dient and invigorated circulation. Increases in wind-driven upwelling 
around Antarctica drove changes in ocean productivity seen in high- 
latitude records; they also enhanced ventilation, potentially drawing 
down carbon, and strengthened cooling across the EOT*®. Our results 
suggest that the net impact of glaciation was mediated by strong feed- 
backs, both negative and positive; a deeper understanding requires 
improved modelling and data integration. Nevertheless, one thing is 
clear: the AIS had a leading role in climate changes during glacial expan- 
sion across the EOT, and the same dynamics and feedbacks are likely to 
be important in the event of AIS retreat. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Community Earth System Modelling (CESM) framework. CESM1.0 is a widely 
used, well-described and well-validated coupled model with atmosphere*’, land” and 
sea ice™*, and ocean model components. For the atmospheric component (CAM4) we 
use the spectral dynamical core at T31 resolution (~3.7° X 3.7° per atmospheric grid 
cell) and the ocean model resolution is 122 latitudes by 100 longitudes (~1.8° x 3.6° 
per ocean grid cell, with much finer resolution near the Equator and at high latitudes) 
with 25 unevenly spaced vertical levels in the ocean. The CESM ocean component 
(POP2) uses anisotropic horizontal viscosity™, the Gent-McWilliams isopycnal mix- 
ing scheme*’ and an updated implementation of the KPP vertical mixing scheme*’, 
and has improved near-surface eddy flux parameterization, improving the effects of 
the diabatic mesoscale eddy fluxes*”**, and improved restratification effects of the sub- 
mesoscale mixed-layer eddies due to a revamped mixed-layer eddy parameterization”. 
The simulation improvements compared with the previous version (CCSM3) in- 
clude improved representation of the El Nifo-Southern Oscillation (ENSO) and 
more realistic sea-ice coupling. The CESM framework is suitable for exploring EOT 
ocean circulation because it has proved capable of reproducing modern ocean cur- 
rents and circulation” and we validated the main circulations produced in earlier 
versions of the model'**. 

Experimental method. For the atmosphere we create aerosol forcing files specif- 
ically for the EOT; the description of this process is found in previous work’. We 
have prescribed global vegetation in the EOT simulations to be the same as in our 
earlier published studies**’, which restricts vegetative feedbacks. The ocean tem- 
peratures for the glaciated and unglaciated EOT simulations are initialized with a 
zonally averaged temperature distribution from the last 50 years of fully coupled 
CCSM3 Eocene modelling simulations run for more than 3,000 years with 560 and 
1,120 p.p.m. CO; (refs 7, 41). The CESM1.0 fully coupled modelling simulations are 
then run for 3,400 years with the last 75 model years used for analysis. We conduct 
two glaciated Eocene simulations at 560 and 1,120 p.p.m. CO, and two unglaciated 
simulations at 560 and 1,120 p.p.m. CO}. For the gateway simulations we branch two 
experiments from year 3,400 of the 1,120 p.p.m. CO, unglaciated Eocene simulation 
and run these experiments for a further >1,000 years. In the first gateway experi- 
ment we have both DP and TG open, and in the second gateway experiment we have 
the DP and TG closed. 

The CO, range of 1,120-560 p.p.m. was chosen because this is close to the esti- 
mated change in atmospheric CO; across the EOT (~1,100 p.p.m.) to lower levels 
around 600 p.p.m. after the transition®”’. The volume of the Eocene AIS that we im- 
pose in the glaciation experiments is 20.3 X 10°km?, ~75% the size of the modern 
AIS. This is well within the estimated size for the AIS in the early Oligocene, for which 
estimated EOT ice volume is between 40% and 120% of modern’”**. We diagnosed the 
surface mass balance of the imposed ice sheets and found them to be positive; that is, 
the simulations are self-consistent. When an ice sheet is imposed on Antarctica there 
is enough snow remaining in summer to keep the ice sheet intact. 

Deep ocean equilibrium. To ensure that the coupled model simulations are in 
equilibrium, we employ a series of metrics. First, the surface energy balance is close 
to 0 in all simulations (about —0.04 W m * in the unglaciated simulations and 
—0.06 W m ” in the glaciated simulations). Second, the volumetrically integrated 
ocean temperature trend from surface to 5 km is —0.016 °C per century over the last 
century in unglaciated simulations. This means that during a 100-year period the 
volumetrically integrated ocean temperature cooled by 0.016 °C. Third, a time series 
of surface and deep ocean temperature across the last 2,400 years of the simulation 
found no significant trend in the surface or deep ocean temperature. Fourth, visua- 
lizations of glaciated and unglaciated simulations plotting the temperature, salinity 
and ideal age trend for each simulation, demonstrate that the key patterns of interest 
in this study are established early and are stable through the length of the simulation. 
Comparison with previous fully coupled EOT simulations. Previous coupled 
atmosphere-ocean EOT simulations required a CO, decrease of 2,240-560 p.p.m. 
(outside the range of proxy reconstructions) to match temperature proxy records’””°™*. 
Here our change in CO) (1,120-560 p.p.m.) is more aligned with CO, levels supported 
by proxies and we find that the combined effect of glaciation and CO, (comparing 
the unglaciated 1,120 p.p.m. CO) case with the glaciated 560 p.p.m. CO; case) cools 
the surface and deep ocean ~4 °C, comparable to the surface cooling of 4.4°C 
(Extended Data Fig. 1c), and benthic cooling of 4°C described in previous fully 


coupled modelling’. We also find a global mean surface temperature decrease of 
4.3°C, which compares well with the recent estimate of a 4-6 °C decrease in tem- 
perature across the EOT”. These brief comparisons with previous work highlight the 
importance of including both AIS and CO, forcing in EOT climate simulations. 

Recent work exploring the sensitivity of zonal mean OHT to changes in a wide range 
of alterations in boundary conditions, including ocean gateways, robustly shows small 
changes in OHT, similar to the results presented in this study”. 

5'80 calculations. 5'°O values from the climate modelling were calculated through- 
out by using the temperature and salinity fields from the ocean model, using the em- 
pirical regression of ref. 46. 

Carbon dioxide and ice-sheet forcing. Combining AIS and CO, forcing gives us 
our closest historical analogue to the EOT, and this change produces ~ 1%o in 8'8O 
across the entire deep ocean basin (Extended Data Fig. 1). Results suggest that gla- 
ciation explains roughly 70% of the change in 5'°O across the southern high latitudes 
(Fig. 2c), whereas CO, explains the 5'°O change in the northern high latitudes 
(Extended Data Fig. 1). 

The combination of both CO, and glaciation induces a more homogeneous cool- 
ing of both the surface and deep ocean (Extended Data Fig. 1c). This figure also shows 
the enhanced vertical 5'°O gradient exists (Extended Data Fig. 1f) even when both 
forcings (CO, and AIS) are combined, although the signal is reduced as a result of 
inclusion of the CO, forcing. This highlights the importance of separating the dif- 
ferent forcings into pieces to fully understand their relative impact on temperature, 
5'°0 and ocean circulation. Recent work for the Middle Miocene transition has 
highlighted the important connections between ice sheet growth, changes in wind 
surface wind fields, and subsequent shifts in ocean circulation’’. The study was able 
to show that the growth of a large modern size ice sheet during the Middle Miocene 
is capable of producing deep ocean cooling while having a modest global surface 
temperature change. 
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Extended Data Figure 1 | Ocean temperature and 5'°O anomalies. 1,120 p.p.m. CO, minus glaciated case with 560 p.p.m. CO, (c). d-f, 880 
comparisons (per mil) zonally averaged over the Atlantic basin; otherwise 


a-c, Zonally averaged temperature anomalies (°C) averaged over all longitudes: 
Glaciated minus unglaciated cases (CO, constant at 560 p.p.m.) (a), 1,120 
minus 560 p.p.m. CQ) cases (both unglaciated) (b), and unglaciated case with 


as in a—c. 
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Extended Data Figure 2 | Ocean temperature and 5'%O anomalies in 
Atlantic Ocean basin due to Southern Ocean gateway opening. 
a, Temperature anomaly (°C) for both gateways opened minus both gateways 
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closed. b, Temperature anomaly for DP and TG closed minus DP open and TG 
closed. c, Temperature anomaly for DP and TG open minus TG closed DP 
open. d-f, As in a-c, except for 5'°O. 
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Extended Data Figure 3 | Absolute sea-ice and anomalous sea surface c, Glaciated minus unglaciated sea surface temperature anomaly. All 
temperature. a, b, Sea ice fraction for glaciated (a) and unglaciated (b) cases. simulations with 1,120 p.p.m. CO. 
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Extended Data Figure 4 | Absolute salinity fields. Salinity (colour contour) and salinity flux (vectors) for glaciated late Eocene (a) and unglaciated cases 
(b) at 1,120 p.p.m. CO>. 
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Extended Data Figure 5 | Absolute sea level pressure and surface wind. Sea level pressure (colour contour) and surface wind (vectors) for glaciated 
(a) and unglaciated (b) cases at 1,120 p.p.m. CO. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


em /s*10%4 


-1.8-1.4 -1 -0.6-020.2 06 1 14 1.8 


Extended Data Figure 6 | Absolute Ekman pumping and transport. Ekman _unglaciated (b) cases at 1,120 p.p.m. COo. See the calculations for Ekman 
pumping contour and Ekman transport overlaid as vectors for glaciated (a)and pumping and transport in Methods. 
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Extended Data Figure 7 | Absolute ocean currents. Zonally averaged ocean __ velocities are scaled by a constant coefficient (500) for plotting purposes. 
currents (meridional and vertical) across the Atlantic Ocean. The verticalocean __Glaciated (a) and unglaciated (b) cases at 1,120 p.p.m. CO. 
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Extended Data Figure 8 | Meridional overturning circulation. Zonally (a) unglaciated, (b) and glaciated minus unglaciated (c) case anomaly at 
averaged meridional overturning circulation anomaly for glaciated 1,120 p.p.m. CO. 
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Extended Data Figure 9 | Depth-latitude plot for the non-interpolated and interpolated 5'°O proxy record anomalies. a, Raw 5'°O anomalies. 
b, Interpolated 5'O anomalies (see Extended Data Table 1). 
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Extended Data Table 1 | Site locality name, palaeolatitude, palaeolongitude, palaeodepth and 57°0 


Site Number Paleo-Lat Paleo-Lon Depth 5'°O 
DSDP366 5.74 -23.17 2710 0.99 
DSDP522 -31.69 -3.82 3039 0.58 
DSDP529 -34.29 -1.87 1672 0.97 
DSDP549 45.90 -13.10 1659 0.59 
ODP689 -65.11 2.88 1650 0.99 
ODP744 -60.41 78.54 2313 0.97 
ODP630 -66.24 1.75 2531 0.18 
ODP748 -57.42 77.72 800 0.74 
1053A 28.10 -69.90 400 -0.28 
ODP738 -61.05 81.53 1750 1.1 


Proxy records compiled from refs 2, 4. Positive values indicate an increase in 380 going from the late Eocene (34-36 Myr ago) into the early Oligocene (31-29 Myr ago). 
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Widespread mixing and burial of Earth’s Hadean 


crust by asteroid impacts 


S. Marchi!, W. F. Bottke!, L. T. Elkins-Tanton’+, M. Bierhaus’, K. Wuennemann’, A. Morbidelli* & D. A. Kring® 


The history of the Hadean Earth (~4.0-4.5 billion years ago) is poorly 
understood because few known rocks are older than ~3.8 billion 
years old’. The main constraints from this era come from ancient 
submillimetre zircon grains”*. Some of these zircons date back to 
~4.4 billion years ago when the Moon, and presumably the Earth, 
was being pummelled by an enormous flux of extraterrestrial bodies’. 
The magnitude and exact timing of these early terrestrial impacts, 
and their effects on crustal growth and evolution, are unknown. Here 
we provide a new bombardment model of the Hadean Earth that has 
been calibrated using existing lunar* and terrestrial data®. We find 
that the surface of the Hadean Earth was widely reprocessed by im- 
pacts through mixing and burial by impact-generated melt. This 
model may explain the age distribution of Hadean zircons and the 
absence of early terrestrial rocks. Existing oceans would have repeat- 
edly boiled away into steam atmospheres as a result of large colli- 
sions as late as about 4 billion years ago. 

Terrestrial planet formation models indicate the Earth went through 
a sequence of major growth phases: accretion of planetesimals and plane- 
tary embryos over many tens of millions of years (see, for example, ref. 6), 
culminating in a final giant impact that led to the formation of our Moon 
(see, for example, ref. 7). This was followed by the late accretion of left- 
over planetesimals that probably contributed less than 0.5% of the 
Earth’s present-day mass’. Although the role of late accretion impacts 
on the Hadean Earth has long been discussed (for example, in ref. 8), 
the precise nature of the impactor flux during late accretion is elusive. 
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Estimates from the abundance of highly siderophile elements (HSEs, 
such as Re, Au, Os and Ru) in mantle-derived peridotites indicate that 
~(0.7-3.0) X 107” kg of material with broad chondritic composition 
was added to the Earth’, probably during the late accretion phase (see 
Methods). 

An additional constraint on this flux comes from the ratio of HSEs 
found in the mantles of the Earth and Moon. Studies of terrestrial and 
lunar samples suggest that the ratio of the mass of broadly chondritic 
material accreted by the Earth and the Moon is probably 2 700:1 (refs 9, 10). 
By modelling the impactor flux on both worlds, it has been argued’” 
that this ratio was a reasonable outcome of stochastic accretion, with 
most HSEs added to the Earth by massive impactors that were statist- 
ically unlikely to strike the smaller Moon. This scenario was recently 
found to be broadly consistent with the current generation of models of 
terrestrial planet formation’. 

Here we assess the early Earth’s impact history by rescaling a recent 
estimate of the lunar impact flux* to Earth. The advantages of this ap- 
proach are numerous. First, the Moon provides a much clearer record of 
the early impact history of the Earth-Moon system®. Moreover, the lunar 
cratering record provides an absolute impactor flux that is independent 
of assumptions made by terrestrial planet formation models. The rescal- 
ing was done transforming lunar craters into a projectile flux, with the 
flux used to estimate the number of terrestrial impactors taking place in 
intervals of 25 Myr between 3.5 and 4.5 Gyr ago (see Methods). For the 
purpose of our work we assume that the Moon-forming impact was at 
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Figure 1 | Mass accreted by the Earth during the late accretion phase. 

a, Cyan curves show 50 representative Monte Carlo simulations corresponding 
to the main-belt size-frequency distribution truncated at Ceres (see Extended 
Data Fig. 1). Each data point indicates the total mass (left y axis) and equivalent 
diameter (right y axis) accreted in that time bin (of 25 Myr each). The 
cumulative accreted mass (or equivalent size) in the period 3.5—-4.5 Gyr ago is 
indicated by the dots at the right of the panel. We assumed a projectile density 


of 3,000 kg m >. The horizontal green lines mark the lower, most probable and 
upper limits for the accreted mass as inferred from HSEs”. b, As in a, but for an 
impactor size-frequency distribution extrapolated at 4,000 km before 4.15 Gyr 
ago (see Extended Data Fig. 1). Simulations that deliver a mass within the HSEs 
range are in red; those in excess of the maximum limit are in blue (the 
corresponding percentages of simulations are indicated). These simulations 
were retained for further analyses. 
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~4.5 Gyr ago, but our results are insensitive to the exact timing. The 
sizes of the projectiles were randomly drawn from an assumed impactor 
size-frequency distribution (SFD) with a shape similar to that of large 
main-belt asteroids. The cut-off of this population was varied for differ- 
ent time intervals. Impactors striking after ~4.15 Gyr ago, the putative 
starting time of the late heavy bombardment (LHB*’**) were given a 
maximum cut-off of 1,000 km, roughly corresponding to the largest 
present-day asteroid Ceres. The impactor SFD before ~4.15 Gyr ago is 
assumed to have larger left-over planetesimals. We assumed a similar 
impactor SFD with a cut-off threshold at 4,000 km, which was extra- 
polated from the SFD of inner and central main-belt asteroids ranging 
between a few hundred kilometres to 1,000 km (see Methods and Ex- 
tended Data Figs 1 and 2). 

Using a Monte Carlo code, we repeated this procedure ~5,000 times 
to address the stochastic variability intrinsic to late accretion projec- 
tiles, and computed the accreted mass (Fig. 1). A key result is that, for 
the case of impactor SFD cut-off at 1,000 km, the total delivered mass is 
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always below the range expected from HSEs (Fig. 1a), whereas a sig- 
nificant fraction of simulations (~30%) fall in this range when the cut- 
off is at 4,000 km before 4.15 Gyr ago (Fig. 1b). We assumed that all 
projectiles and their HSEs were fully accreted into the silicate Earth. 
Although the latter assumption is probably true for small projectiles 
that disintegrate in the impact event, larger objects may not deliver their 
HSEs efficiently to the mantle in all circumstances. Assuming that up to 
50% of the cores of planetesimals =2,000 km may be lost (see Methods), 
we compute that the corresponding total percentage of successful simu- 
lations may be as high as ~40%. These results elucidate several import- 
ant features of late accretion. The bulk of the mass (99% and 90%) is 
delivered by ~17 + 5 and ~6 + 3 largest projectiles, respectively, and 
the largest impactors can exceed ~3,000 km. This is therefore a highly 
stochastic regime, in which a few projectiles dominate the budget of HSEs 
delivered to Earth, in agreement with previous findings’’. Moreover, by 
tracking the timing of the impacts, we also find that most of the mass is 
typically accreted over a significant fraction of the Hadean (~4.2-4.5 Gyr 
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Figure 2 | Spatial distribution and sizes of craters formed on the early Earth. 
a, Mollweide projections of the cumulative record of craters at four different 
times. Each circle indicates the final crater size estimated from the transient 
cavity size from our simulations and a conservative estimate for the transient- 
to-final crater size scaling (see Methods). The maps do not show ejecta blankets 
and melt extrusion on the surface, which can greatly expand the effects of 


>3.5 Gyr ago 


cratering; they also do not account for a hotter early geotherm, which would 
also result in larger crater sizes (see Methods). The colour coding indicates 
the time of impact. The smallest projectiles considered have a diameter of 

15 km. We assumed an impact velocity of ~16 and ~25 kms _' before and after 
4.15 Gyr ago (see Methods), respectively, and a most likely impact angle of 45°. 
b, As in a, but including melt extrusion on the surface as discussed in the text. 
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ago), suggesting that the early Hadean silicate Earth could have had a 
substantially different budget of HSEs and trace elements compared 
with the current Earth (see Methods). Note also that less than 0.5% of 
the simulations deliver more than 1% of an Earth mass. 

Our terrestrial bombardment model also sheds light on the role of 
impacts on the geological evolution of the Hadean Earth, with particu- 
lar emphasis on mixing, burial and melting of the uppermost layers. To 
model these effects quantitatively, we performed a suite of impact simu- 
lations with the Simplified Arbitrary Lagrangian Eulerian shock-physics 
code (iSALE’’), and computed the resulting excavation cavity size, exca- 
vation volume and depth, and the volume of target melt. We varied the 
target temperatures and considered impactor diameters ranging from 
15 to 4,000 km with a range of impact velocities (see Methods). 

A key process is the impact-generated mixing as a result of the exca- 
vation and collapse of large transient cavities in the lithosphere. We found 
that before ~4.4 Gyr ago up to 60-70% of the Earth’s surface was re- 
worked to a median depth of 20 km (Fig. 2a). Thus, our model predicts 
prolonged crustal reworking and mixing of various components, as 
inferred from recent Pb-Hf isotope systematics of Hadean zircons’*. 

Melting of the target is an additional important process. We computed 
the melt produced by shock pressure, using both analytical estimates’”"* 
and iSALE simulations. Our simulations agree well with analytical 
estimates for impactor sizes below 100 km, but deviate significantly for 
larger impactors (Fig. 3a) (see Methods and Extended Data Figs 3 and 4). 
Both methods neglect impact-induced decompression and subsequent 
adiabatic melting of rising material in the mantle, which increase the 
total volume of melt. These processes have been quantitatively mod- 
elled for impactor diameters smaller than 100 km (ref. 19), and here we 
use their predictions for an impactor 100 km in diameter and then ex- 
trapolate their results to larger projectiles (Fig. 3a). Note that the melt 
volume is a lower limit for large projectiles that are expected to induce 
major mantle perturbations, resulting in voluminous adiabatic melting”. 
Computed melt volumes greatly exceed the volumes of current flood 
basaltic provinces’’. Some fraction of the mantle melt will erupt; through 
isostatic adjustment, melt may be expelled from the shallowing crater 
onto the planetary crust’. Melt spreading is also aided by the dynamics 
of cavity collapse in a hotter crust”!, such as that envisioned to have oc- 
curred during the Hadean”’. Assuming that the impact-generated melt 


10° 


108 


Volume (km°) 


10” 


10° 


100 


1,000 


Impactor 


Figure 3 | Melt production by large impacts on the Earth. a, Impact-shock 
melt volume from analytical estimates'® (red line), and impact-generated melt 
volume (including shock and decompression melting) for impacts simulated 
with iSALE (black dots). These simulations assumed a planar target, a 
lithospheric thickness of 125 km, a mantle potential temperature of 1,400 °C, an 
impact velocity of 12.7kms™' (corresponding to 18 kms~' for an impact angle 
of 45°). The mantle potential temperature of the Hadean Earth may have been 
hotter than assumed here by ~200 °C (see Methods and Extended Data 
Table 1). We also ran simulations for a mantle potential temperature of 
1,600 °C and found that the melt volume increased by 75% (see Methods). The 


580 | NATURE | VOL 511 | 31 JULY 2014 


flows on the surface, we used the estimated melt volumes to calculate 
that the corresponding diameter of a spherical cap with a thickness of 
3 km (comparable to large terrestrial igneous provinces) is ~20-30- 
fold that of the impactor diameters (see Fig. 3b). As a result of melt 
spreading, lithologies previously exposed at the surface are buried over 
large areas. 

The effects of melt burial due to impacts are shown in Fig. 2b. The 
cumulative fraction of Earth’s surface buried by impact-generated melt 
is 70-100% since 4.15 Gyr ago, and it increases to 400-600% during 
the period 4.15-4.5 Gyr ago. These findings do not preclude the possi- 
bility of having large unaffected surface areas at any given time step 
(see Fig. 2b, Extended Data Fig. 5), a condition required for liquid 
water to be stable at the surface as indicated by 5'°O measurements 
in Hadean zircons**™. 

Additional constraints on the terrestrial bombardment flux may come 
from trace elements entrapped in Hadean zircons. Their rare-earth ele- 
ments, U-Pb and Pb-Hf ages and Lu/Hf ratios point to significant mix- 
ing of mafic and felsic reservoirs (see, for example, ref. 16). This mixing 
is sometimes attributed to volcanism or subduction*"*”, in which weath- 
ered upper crustal reservoirs are buried at depth. Once buried, the dif- 
ferent components melt, and the resulting magma can then crystallize 
Hadean zircons’. It is unclear, however, whether these processes can 
readily explain the observed age distribution of Hadean zircons, which 
is characterized by a well-defined peak at 4.1-4.2 Gyr ago and a lack of 
ages older than ~4.4 Gyr (ref. 2). Moreover, it is unresolved whether 
the lack of ultra-ancient zircons implies that the right conditions for 
zircon formation were not met during this time, or whether zircons 
older than ~4.4 Gyr did not survive subsequent evolution. 

Our model shows that substantial burial could be achieved by impact- 
generated melt. Assuming that burial is required to make Hadean 
zircons*'*”, we investigated whether the burial by impact-generated 
melt could explain the Hadean zircons age distribution. In our Monte 
Carlo code, each simulated impact was assumed to bury surface lithol- 
ogies within a threshold distance proportional to the projectile diameter 
d multiplied by a factor f (~fd). This process results in increased crustal 
temperatures over a large annulus around the impact site, possibly lead- 
ing to eutectic melting of buried wet crustal material, in agreement with 
the observation that many of the Hadean zircons probably crystallized 
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green diamond represents the total melt volume (including decompression and 
adiabatic melting) from ref. 19 for a crater 800 km in diameter (corresponding 
to an impactor of 100 km for the assumed impact conditions) and a mantle 
potential temperature of 1,450 °C. The latter data point is 2.3-fold higher than 
our data point. Assuming that a similar scaling holds for larger projectiles, we 
obtain the green curve. The curves neglect adiabatic melting and therefore 
provide a lower limit for the total impact-generated melt. b, Ratio of surface 
melt diameter (for a thickness of 3 km) to impactor diameter (f). The horizontal 
grey area indicates f = 20-30, as discussed in the text. Symbols as in a. 
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from wet eutectic melts*. The increase in crustal temperatures is caused 
by the hotter geotherm of the buried material produced by melt above 
and possible thinning from the bottom, the rise of mantle melts into the 
fractured and tectonized lithosphere close to the crater’s rims'®”’, and 
possibly also by dripping crustal diapirs originating from the thick 
surface melt layer at larger radial distances”. 

Wefind that for f~ 20-30 for impactor diameters larger than 100 km, 
as predicted by our simulations, the resulting surface age distribution 
matches well the Hadean zircon age distribution (see Fig. 4). In con- 
trast, simulations having f < 20 orf = 40 fail to reproduce the data (see 
Methods). This fit, if not a coincidence, tells us that large projectiles 
were capable of making zircons and resetting pre-existing ones over 
regions well beyond their computed crater rims, in agreement with our 
estimates of impact-generated melt (Fig. 3). 

Several observations may be explained by consequences of the proposed 
mechanism for Hadean zircons formation. First, given that large impac- 
tors could have struck at relatively late times (for example the expected 
surge of projectiles at 4.15 Gyr ago via the LHB), zircon production 
through these impact-generated processes could have occurred for many 
hundreds of millions of years, as observed’”’. Attrition among the oldest 
zircons was pronounced, because they were subject to high temperatures 
near numerous impact locations”, burial at depth by melt and ejecta, 
and redistribution to the upper crust. Most were reset or destroyed. Con- 
sequently, our model predicts that the paucity of zircons older than 
4.4 Gyr is expected from collisional processes. Moreover, the formation 
of Hadean zircons at depth in large annuli around major impacts pro- 
vides a ready explanation for their lack of clear signs of impact shock”?’ 
that are commonly observed among younger zircons. Another attract- 
ive aspect of our model is that it explains the mixing of the protoliths 
from which Hadean zircons crystallized, as inferred from the Hf-isotope 
record”. Finally, the volume of the buried lithologies at any time step is 
about an order of magnitude higher than the fraction of the crust melted 
by small impacts, indicating that the zircon crystallization from impact 
melts was negligible (see Methods), in agreement with the low crystal- 
lization temperatures observed in Hadean zircon ***". 

We argue that the peak of Hadean zircon ages at 4.1-4.2 Gyr reflects 
the onset of the LHB, as suggested by meteorite Ar—Ar shock degassing 
ages and other data’*"*. Indeed, we find that a scenario with an LHB spike 
at significant younger ages, say 3.9 Gyr, is inconsistent with Hadean zircon 
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Figure 4 | Detrital Hadean zircon ages compared with the computed 
distribution of impact-generated ages. Zircon ages (coloured curves 
correspond to different data sets: orange, 207Dp_?°°Ph ages’; blue, U-Pb ages”’; 
green, 207DH—?>Ph ages'®; red, U-Pb ages”) show a distinct peak at 

~4,1-4.2 Gyr ago. In agreement with our iSALE simulations, the distribution of 
impact-generated ages (black line, shaded area) is computed for f= 30 for 
projectiles larger than 100 km and f = 9 for projectiles smaller than 100 km, and 
is an average of 50 successful Monte Carlo simulations (Fig. 1). All distributions 
are normalized to unit area. 
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age distributions. A similar conclusion is reached for a steadily declining 
bombardment (no LHB) scaled to match the abundance of lunar HSEs 
(see Methods). Therefore, LHB-era impactors provide a natural expla- 
nation for the clustering of Hadean zircon ages that would otherwise 
require ad hoc endogenic conditions (for example increased subduc- 
tion or volcanism rates). 

The new picture of the Hadean Earth emerging from our work has 
important implications for its habitability. Before ~4 Gyr ago, no sub- 
stantial large region of the Earth’s surface could have survived untouched 
by impacts and associated outcomes. Large impacts had particularly 
severe effects on extant ecosystems. We find that the Hadean was plau- 
sibly characterized by one to four impactors larger than 1,000 km capa- 
ble of global sterilization’, and by three to seven impactors larger than 
500 km capable of global ocean vaporization®. The median time for the 
latest impactor larger than 500 km to hit the Earth was ~4.3 Gyr ago. 
In ~10% of the simulations, this could be as recently as ~4 Gyr ago 
(Extended Data Fig. 6), depending on various assumptions. Thus, life 
emerging during the Hadean was probably resistant to high tempera- 
tures and was capable of spreading from the stable niches that existed at 
that time. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Impactor size-frequency distribution and flux. The SFD of bodies colliding 
with the early Earth cannot be directly constrained because subsequent geological 
evolution has erased the signatures of those impacts. Instead, we turn to very old 
cratered terrains found on the Moon, Mars and Mercury. The crater SFDs observed 
on those terrains has been used to constrain the shape of the impactor SFD in 
ancient times. The earliest visible populations of craters have a characteristic SFD 
resembling that of the current main asteroid belt'***-°*. In this work we assume that 
a similarly shaped impactor SED was striking the early Earth. Although it is natural 
to assume that the Earth and Moon have been exposed to the same impactor flux, a 
critical aspect of work concerns the impactor SFD at large sizes. In fact, the larger 
geometrical cross-section of the Earth than that of the Moon may allow larger 
projectiles to hit the Earth than the Moon. For comparison, the largest confirmed 
impact structure in the inner Solar System—the ~2,500-km South Pole—Aitken 
basin on the Moon—was produced by a projectile ~170 km across”’. It is possible 
that larger objects struck the early Earth, in particular if the shape of the impactor 
SED was shallow for large objects (see, for example, ref. 10). Therefore the impactor 
SEDs derived from crater populations observed on the terrestrial planets needs to 
be extrapolated to larger sizes if they are to be applied to the Earth. For the reasons 
discussed above, we considered a main-belt-like SFD up to ~ 1,000 km (Ceres), and 
we also considered an extension of the main-belt SFD up to 4,000 km (see the text 
and Extended Data Fig. 1 for further details). 

The results of our work are not sensitive to the fine details of the shape of the 
impactor SFD. The more important issue is that the population of left-over pla- 
netesimals had large enough impactors (diameter > 1,000 km) to allow it to repro- 
duce the abundance of terrestrial HSEs. Here we consider a cut-off of 4,000-km 
and 1,000-km projectiles for impactors striking before and after 4.15 Gyr ago, 
respectively. The number of large objects in the population can conceivably be 
constrained by the largest lunar basin, provided that these impacts took place after 
the formation of the Moon’s crust. Our modelling work indicates that the Earth 
was hit by 25-45 impactors larger than 200 km. Assuming a Earth-to-lunar scaling 
of ~20:1 (ref. 13), we obtain one or two South Pole-Aitken-forming impactors 
hitting the Moon. This is consistent with the lunar basin record and the lunar 
HSEs. We also estimate that an average of five impactors larger than 500 km hit the 
Earth. This translates into a ~70% (= 1 — 6/20) probability that the Moon escapes 
these impacts. These numbers provide a sanity check that our assumed impactor 
SED is compatible with available constraints. 

Note also that in an alternative scenario for the origin of the HSEs found within 
the mantles of the Earth and Moon”, it has been argued that a thin, dynamically 
cold disk of small bodies spread across the terrestrial planet region would allow the 
Earth to accrete much more mass than the Moon. This scenario requires that a 
very thin disk be maintained during the planet formation era until the giant impact 
that made the Moon take place—probably many tens of millions of years to 
perhaps 100 Myr after Ca, Al-rich inclusions formation. It is unclear to us how 
this disk avoided dynamical excitation from planetary perturbations for this long 
interval (well after the solar nebula had dissipated). Beyond this, concerning HSEs 
on or in the Moon, this scenario would produce high enrichment in HSEs in the 
lunar crust (which is not observed) and it does not explain how the HSEs within 
the small bodies would breach the crust to reach the Moon’s mantle. There is also 
no explanation provided of how this scenario would produce the crater SFDs 
found on ancient lunar terrains”. 

Concerning the terrestrial impact rate, we considered several scenarios. The 
nominal model assumes the so-called lunar sawtooth bombardment profile for the 
Moon‘ extrapolated to the Earth. For this, we took the lunar impactor flux (as 
derived from the observed number of craters as a function of time), defocused it for 
the lunar gravitational field and Earth’s gravitational field at the current lunar 
orbit, and then applied the Earth’s gravitational focusing. More specifically, the 
crater size-frequency distribution on ancient lunar terrains has been converted 
into projectiles assuming first, a crater-to-projectile size scaling law for hard-rock”; 
second, a lunar impact velocity of 11kms | for the period between 4.15 and 
4.5 Gyr ago; and third, a lunar impact velocity of 22 kms ' for the period between 
3.5 and 4.15 Gyr ago. The factor of ~2 increase in the impact velocity comes from 
observations of lunar crater populations* and is in agreement with dynamical 
estimates of terrestrial planet accretion’’ and how projectiles in the inner Solar 
System may have reacted to late migration of the giant planets'*. The correspond- 
ing terrestrial impact velocities are ~16 and ~25kms '. The projectile flux and 
impact velocity were then rescaled to the Earth, assuming that impactors pro- 
ducing the oldest visible lunar craters were not affected by Earth’s gravitational 
focusing. This approximation is valid for most of the Moon’s orbital evolution 
except for the first few million years (ref. 39), which are not relevant to the work 
presented here. This procedure gives a flux scaling factor for the Earth and Moon of 
~1.24 and ~1.90 (per unit surface), respectively, for the LHB and pre-LHB times 
for the assumed impact speed at infinity. As detailed in the text, our extrapolation is 
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done by rescaling the lunar flux to the Earth. Here we assume that the LHB occurred 
at ~4.15 Gyr ago (the median value of the 4.1-4.2-Gyr interval of acceptable values, 
as concluded in refs 4, 13). The corresponding lunar flux curve is shown by the red 
line in Extended Data Fig. 2. We also considered the case of a narrow intense spike 
of LHB impacts at 3.9 Gyr ago, roughly corresponding to the scenario discussed in 
ref. 40 (cyan curve). Recent work has suggested that this LHB is unlikely to fit 
constraints*'*'**!, However, this assumption is still adopted by some researchers. 
Both of our impact flux curves were obtained by requiring that the integral of the 
accreted mass on the Moon match the abundance of lunar HSEs inferred to exist in 
the Moon’s mantle, appropriately corrected to take into account partial accretion‘. 
Finally, for the sake of completeness, we also considered two scenarios that exclude 
the LHB. The first is a simple extrapolation of the nominal case up to 4.5 Gyr ago 
(black curve). Note that here the total mass accreted by the Moon exceeds that 
predicted by the HSEs by a factor 3-4; therefore ref. 4 concluded that this scenario 
is unlikely. To compensate for this, we also considered a rescaled flux, reduced by a 
factor of one-third, to match the HSE constraint (green curve). All flux curves start 
at 4.5 Gyr ago, which is the assumed time for the formation of the Moon; however, 
our results are insensitive to the exact timing of Moon formation. We discuss how 
these curves compare with terrestrial zircon data in the following. 

Terrestrial budget of HSEs. Terrestrial impactors may not be fully accreted with 
implication for the delivery of HSEs. For instance, this may be the case for large graz- 
ing projectiles’. In addition, projectiles larger than ~2,000 km may sequester mantle 
HSEs into Earth’s core*’, whereas high-resolution smoothed-particle hydrodyn- 
amics simulations show that as much as ~50% of the core of a large differentiated 
impactor may plunge into the Earth’s core (R. M. Canup, personal communication, 
August 2013). Thus, in scenarios involving collisions with projectiles larger than 
~2,000 km the accreted mass would be higher than that estimated by HSEs (see 
Fig. 1). 

Moreover, is it possible that some terrestrial HSEs predate the formation of the 
Moon? In canonical giant impact events that make the Moon’, most of the Earth’s 
mantle is molten or partly molten, thus facilitating the segregation of HSEs into the 
core. Recent models have pushed the giant collisions to higher energies with respect 
to the canonical models; therefore an efficient segregation of pre-giant impact HSEs 
is even more likely (R. M. Canup, personal communication, April 2014). Ref. 44 
found that ¢!*’W (that is, the ratio of sample 182~7/184w to the terrestrial standard 
value, in parts per 10*) of the Moon is significantly different with respect to that of 
the Earth. This can be explained ifan amount of tungsten with a broadly chondritic 
isotope ratio was delivered in chondritic proportions with the HSEs. This would 
suggest that HSEs were mostly accreted after the formation of the Moon. 
Simulations of large terrestrial impacts with iSALE. Investigating the thermal 
effects of the early bombardment history on Earth implies a detailed quantitative 
understanding of the consequences of hypervelocity impacts of cosmic bodies of 
given mass, composition, velocity, and angle of incidence. Hydrocode modelling 
may serve as the most accurate approach with which to estimate crater size and the 
amount of shock wave-induced heating and melting of crustal and mantle rocks as 
the result of a collision. However, given the large number of collisions produced on 
the early Earth, it is impossible to model each impact event individually. A para- 
meterization of the relationship between the size of an impact event (projectile 
mass m, diameter d, impact velocity v and impact angle «) and the resulting crater 
diameter D, depth h, volume V, excavation depth d,,, excavation volume V,.. and 
melt volume Vyeit is required. Existing scaling relationships (for example, refs 45- 
54) are based on laboratory and numerical experiments, analytical considerations 
and observations of the Earth’s and lunar crater records. Whether these scaling 
relationships can be extrapolated to the size of impactors several hundred of 
kilometres in diameter as those that occurred in the early history of the Earth is 
questionable and requires further analysis and/or modifications of existing scaling 
laws to confirm their applicability to the given problem. 

We used the hydrocode iSALE (see ref. 15 and references therein) to conduct a 
series of two-dimensional numerical models of impacts with projectile diameters d 
ranging from 1 to 1,000 km and impact velocities v of 8.5-17 kms" ' (a few runs for 
4,000-km impactors were also performed). In all models the impactor is resolved by 
50 cells per projectile radius, and we assume a dunitic composition with a density 
5 = 3,314kgm *. We do not consider the impact angle «, which naturally can vary 
between 0° and 90°, with the most likely encounter at 45°, and modelled vertical 
impacts (90°) only ona cylindrical, axial-symmetric two-dimensional grid ona planar 
target surface. Both simplifications (vertical impacts and planar target surface) reduce 
the computational costs of an individual simulation significantly and thus permit 
detailed parameter studies based on a sufficient number of numerical models. To 
compensate for the lack of varying impact angles in our models we assume an impact 
velocity that corresponds to the vertical component of the velocity vector (v, = v sin), 
an often-used simplification to approximate oblique impacts by two-dimensional 
simulations****** that was originally suggested in ref. 57. 
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In the baseline scenario, we assume a layered target composed of a lithosphere 
125km thick consisting of a 30-km granitic crust and a 95-km dunitic upper 
mantle. Within the lithosphere, heat is transported by conduction, giving rise to 
a relatively steep temperature gradient from 20 °C at the surface to 1,427 °C at the 
transition from the upper to the lower mantle (the asthenosphere; Extended Data 
Fig. 3). The temperature gradient in the asthenosphere is adiabatic according to the 
assumption that heat is transported by convection. At a depth of 2,930km we 
consider an iron core with a constant temperature of 2,727 °C. Because we assume 
pure iron and neglect alloy composition, the solidus is relatively high and the core 
is in a solid state; however, the rheological properties of the core do not affect our 
models, because the craters in the biggest impact events under consideration do 
not reach as deep. 

The thermodynamic behaviour of matter in our models is simulated by the 
ANEOS (Analytic Equation of State; ref. 58) for granite’ and dunite”’. ANEOS can 
only account for one phase transition; we therefore took into account the solid 
state transition expected to have the biggest effect on the total amount of melt 
production by shock heating. Because we do not consider latent heat of melting, 
our estimate of shock melting corresponds to an upper estimate (for further 
explanations see refs 53, 60, 61). The rheological model, the resistance of rocks 
against plastic deformation, is explained in detail in ref. 62. We do not account for 
temporary weakening of matter during crater formation by acoustic fluidization” 
as required to explain mid-size complex crater morphologies. All material prop- 
erties and model parameters are listed in Extended Data Table 1. We also neglect 
the effects of target spherical symmetry, which are estimated to contribute less 
than ~20% to the volume of melt for the impactor sizes considered in our work’’. 
Note that direct two-dimensional cylindrical iSALE simulations (equivalent to full 
three-dimensional simulations for head-on collisions) have shown that the target 
curvature is negligible for projectile-to-target size ratios up to ~0.2 (ref. 64), 
corresponding to a projectile of 2,500 km for the Earth, in agreement with ana- 
lytical estimates'’. This gives confidence in the validity of the analytical estimates. 
The analytically estimated error of ~20% for the largest projectiles is within the 
error of our model. 

All models begin with the first contact between the projectile and the target (we 
neglect the presence of an atmosphere) and stop after the collapse of the transient 
crater. Our models include the structural uplift of matter but do not last until all 
material is settled and the final crater is reached. Primarily our simulations aim at 
the determination of crater size, excavation depth and the melt that is generated by 
the impact event. 

Previous studies aimed at the computation of impact melt volumes usually 
considered shock-induced melting only. Melting of rocks during impact is the 
result of shock-wave compression and subsequent release. Shock compression is 
an irreversible process in which plastic work is done on the target material that 
remains in the rock as heat after subsequent isentropic release and can raise the 
temperature of the target above the melt temperature. To quantify the amount of 
impact-generated melt it is necessary to determine the volume of material that 
experiences a peak shock pressure in excess of the material’s critical shock pressure 
for melting (P.). The critical shock pressure for melting (or the corresponding 
entropy) for granite and dunite is a material property that can be measured by 
shock experiments (see, for example, ref. 65) and serves as an input parameter for 
ANEOS*. Because the petrographical composition for a given rock type, such as 
dunite, may vary, the stated P. values found in the literature range from ~91 to 
~156 GPa depending on whether pure fosterite or peridotite composition and 
incipient or complete melting are considered, respectively. Extended Data Fig. 4a 
shows the impact melt production determined by hydrocode simulations for 
impacts on layered targets (granite and dunite; see above) in comparison with 
scaling relationships proposed by refs 17, 18,51. Extended Data Fig. 4b shows the 
dependence of melt production on impact velocity. The melt volume is deter- 
mined in our hydrocode simulations by Lagrangian tracer particles that experience 
shock pressures in excess of P.. Each tracer represents the amount of matter in the 
computational cell where it was initially located in (see, for instance, ref. 61). 
Apparently, the melt volume varies at most by a factor of two depending on the 
chosen P- (91 or 156 GPa, respectively), which we consider to be insignificant for 
the present study. Thus, a more accurate approach considering partial melting if 
the post-shock temperature is between solidus and liquidus as proposed in ref. 66 
was not included in this study. The much lower P, for granite (46 GPa for incipient 
melting and 56 GPa for complete melting”) raises the total amount of melt for 
impactors <100 km in diameter more significantly; however, for very large impac- 
tors (>100km in diameter) the total amount of melt is dominated by mantle 
material, and the contribution of crustal material is negligible. 

The critical pressure method for the quantification of impact melt production is 
in good agreement with estimates of the observed melt volumes at terrestrial 
impact craters”, but it may not provide accurate estimates for very large impactors 
several hundred kilometres in diameter penetrating deep into Earth’s mantle, for 


two reasons. First, with increasing depth, where material experiences shock com- 
pression the pre-impact temperature and lithostatic pressure become important. 
The initial temperature of the rock affects the critical melt pressure P.: preheated 
rock tends to show shock-metamorphic effects including melting at lower shock pres- 
sures than rocks at normal surface temperature. Second, at a depth approximately 
larger than the transition from the lithosphere to the asthenosphere, rocks may 
not melt at all because the lithostatic pressure raises the solidus above the shock- 
induced temperature increase. Contrarily, structural uplift of originally deep-seated 
material as a result of the gravity-driven collapse of the transient crater may give 
rise to decompression melting. However, the effect of decompression melting has 
been estimated to be small in comparison with shock-induced melting for impac- 
tors 20 km in diameter®, but may well be important for very large impactors and 
steep geotherms®. 

To account for both effects (temperature increase with depth and unloading 
from the shock pressure to the lithostatic pressure at given depth) we used an 
alternative approach to determine the total amount of impact-generated melt. We 
simply record through all computational time steps in our hydrocode simulations 
whether the temperature of a tracer is in excess of the solidus temperature as a 
function of pressure for the given location (depth) and mark it as molten. For small 
impactors (a few kilometres in diameter) this approach provides the same results 
as the P- method (see Extended Data Fig. 4). With increasing projectile diameter 
the melt volume deviates from the scaling lines (open circles) according to the critical 
pressure method, and slightly increased melt volumes occur (50-100 km projectile 
diameter; note the small variation at 25 km diameter resulting from the change from 
a granitic to a dunitic melt composition). For larger impactors (>100 km diameter) 
the melt volumes decrease below the expected trend according to a straight line on a 
double-logarithmic plot (power-law scaling). This is due to the fact that shock- 
heated material does not unload to pressures at which the post-shock temperature 
is in excess of the solidus temperature. The reason for this is the fact that the increase 
in solidus temperature with depth is steeper than the increase in adiabatic temper- 
ature in the lower mantle (Extended Data Fig. 3). Increasingly higher shock pres- 
sures are therefore required to raise the temperature above the solidus. In summary, 
the difference between iSALE simulations and the analytical estimates can be under- 
stood by accounting for the different assumptions used for each case. For example, 
using projectile diameters in the range ~100-1,000km, our simulations predict 
more melt than analytical estimates do'”'*, because the code accounts for decom- 
pression melting. In contrast, for projectiles larger than 1,000 km, iSALE finds less 
melt than our analytical estimates'”’* because the latter neglects the increase of the 
solidus temperature as a function of depth. 

Our analytical scaling relationships for melt production and crater sizes have 
been used as a rough guide, but they do not affect the conclusions of our paper; 
they are based on hydrocode modelling of impacts. Similarly, details of the com- 
position and nature of the Hadean crust and the geotherm, for example, have little 
effect on our results. This is mainly because our conclusions are based on the 
effects of large collisions (impactors larger than 100 km); the volumes of melt 
produced by big impactors are largely insensitive to variation in the mantle poten- 
tial temperature and its composition. The key factor is the overburden pressure, as 
detailed above. 

We also find that the melt volume increases by ~75% if the mantle potential 
temperature increases by 200°C (ref. 70) with respect to the standard case (or 
1,600 °C). We also tested the case of a thinner lithosphere (80 km) and found that, 
for impactors larger than 1,000 km, the volume of melt is generally a few per cent 
higher than the nominal lithosphere (125 km), and up to 50% higher for impactor 
sizes between 100 and 1,000 km. 

Finally, our estimates of impact-generated melt volume do not consider adiabatic 
melting of rising mantle elements. The latter process, yet to be the subject of a 
systematic study, is potentially very important for large projectiles. As discussed in 
ref. 20, projectiles larger than ~800 km may produce long-lasting perturbations to 
mantle dynamics. On a short timescale (1-10 Myr relevant for our work), the 
perturbation is characterized by umbrella-shaped patterns with rising elements at 
the centre, lateral spreading in the upper mantle and subsequent downwelling (see, 
for instance, Fig. 7 of ref. 20). As a results, voluminous quantities of rising mantle 
may melt adiabatically. It is therefore possible that our estimates of the melt pro- 
duction (Fig. 3) are lower bounds, in particular for projectiles larger than 1,000 km. 

Concerning the estimate of the final crater size discussed in the text (see, for 
example, Fig. 2a), we relied on a transient-to-final crater size scaling law derived 
from lunar craters’'. Recent work has found that the temperature of the lithosphere 
has an important role in the modification stage of crater formation”’, and for a 
hotter target the final crater size can be twice as large as in classical cold scaling. The 
hot lithosphere scaling is probably more realistic for the early Earth; therefore the 
final crater sizes discussed here are likely to be underestimated by a factor of ~2. 
Impact-generated melt extrusion and Hadean zircon formation. Although it is 
well-known that zircons can crystallize from impact melt pools (see, for example, 
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ref. 31), itis argued that the low crystallization temperatures of Hadean zircons are 
largely incompatible with such an origin*”’"”*. Our model shows that a fraction of 
the Earth’s uppermost surface layer was melted by impacts. By analogy to recent 
terrestrial craters (for example Sudbury), it is expected that zircons should have 
also crystallized from the melt, even for intermediate-to-mafic melts via fractional 
crystallization”. This process would produce zircons that crystallized at higher 
temperatures than most Hadean zircons*’”’. Although it is conceivable that a 
fraction of the high-temperature Hadean zircons (720°C < T<750°C) may 
come from impact melt pools (see, for example, ref. 28), this process seems sec- 
ondary. Thus, a natural question arises: if impact melt was widespread on the 
Hadean Earth, why do Hadean zircons have a low crystallization temperature? 

Our proposed mechanism naturally explains this observation. We computed 
the total volume of impact-generated melt produced in a spherical shell 100 km 
thick (that is, the crust) by all projectiles larger than 50 km. Larger projectiles were 
assumed not to contribute here because they blast through the shell. The result is 
that at least 10-20% of the shell is directly melted by impacts. For comparison, we 
also compute the volume of the shell buried by melt extrusion around the impacts 
(corresponding to an annulus from 10 to 30 times impactor radii from the impact 
point, only for projectiles larger than 100km). This yields >800% of the shell 
volume, implying that the shell is reprocessed over and over. Therefore the latter 
process dominates by almost two order of magnitudes with respect to direct 
impact melt of the shell, explaining the paucity of high crystallization temperature 
among Hadean zircons. Note that this result is basically independent of the 
assumed thickness of the shell. A more significant contribution to zircon forma- 
tion may come from the thick layers of melt extruded onto the surface (Fig. 2b and 
Extended Data Fig. 5). Given the deep origin of these magmas, however, they were 
probably ultramafic in composition, thus inhibiting significant zircon formation. 
Even assuming that low-temperature zircons could have formed from high- 
temperature melts as a result of fractional crystallization after a substantial decrease 
in temperature (see, for example, ref. 74), the volume of this fractionated reservoir 
would have been negligible with respect to the volume of the material buried. Thus 
the main contribution of these mantle mafic melts was probably as a crustal heat 
source, not as zircon source material. As discussed in the text, Hadean zircons 
probably crystallized from wet eutectic melts’. Such conditions may have been 
achieved as a result of impact-generated melt burial of large portions of the surface. 
In other words, buried weathered material could have been efficiently heated to wet 
melting conditions by magmatic intrusions (ref. 75, p. 168) from the mantle (as 
claimed to explain the volcanic plains around the 2,300-km Hellas basin on Mars; 
refs 19, 25, 76), and by the steepened geotherm produced by thinning of the 
lithosphere (close to the crater’s rims). At larger radial distances, the sinking of 
lava and crustal recycling (see Fig. 10 of ref. 26; see also a recent commentary in ref. 
77) may have been particularly important in a regime characterized by a highly 
fractured crust (due to impacts), as in the Hadean. 

Finally, as discussed in the text, we investigated several impactor fluxes 
(Extended Data Fig. 2) and how they compare to the Hadean zircon age distri- 
bution. The key parameter for this comparison is the ratio of the diameter of the 
surface melt to the diameter of the projectile (f; see Fig. 3). In the limits of the 
approximations described above (namely f ~ 20-30; see the text), we find that only 
the nominal case (namely LHB at 4.15 Gyr ago; red curve in Extended Data Fig. 2) 
reproduces the Hadean zircon age distribution. The case of an LHB at 3.9 Gyr ago 
fails to reproduce the zircon data for any value of f, whereas the case with no LHB 
(green curve in Extended Data Fig. 2) only matches the zircon data for f > 50. This 
large f value is unjustified according to our estimates of melt volumes; this scenario 
is therefore extremely unlikely. 


33. Strom, R. G., Malhotra, R., Ito, T., Yoshida, F. & Kring, D. A. The origin of planetary 
impactors in the inner Solar System. Science 309, 1847-1850 (2005). 

archi, S., Mottola, S., Cremonese, G., Massironi, M. & Martellato, E. A new 

chronology for the Moon and Mercury. Astron. J. 137, 4936-4948 (2009). 

35. Marchi, S., Bottke, W.F., Kring, D.A.& Morbidelli, A. The onset of the lunar cataclysm 

as recorded in its ancient crater populations. Earth Planet. Sci. Lett. 325, 27-38 

(2012). 

36. Fassett, C. |., Head, J. W. & Kadish, S. J. et a/. Lunar impact basins: stratigraphy, 

sequence and ages from superposed impact crater populations measured from 

Lunar Orbiter Laser Altimeter (LOLA) data. J. Geophys. Res. Planets 117, EOOLO8 

(2012). 

37. Schlichting, H. E., Warren, P. H. & Yin, Q.-Z. The last stages of terrestrial planet 

formation: dynamical friction and the late veneer. Astrophys. J. 752, 1-8 (2012). 

elosh, H. J. Impact Cratering: A Geologic Process (Oxford Monographs on Geology 

and Geophysics no. 11, Clarendon Press, 1989). 

39. Murray,C.D.& Dermott, S.F. Solar System Dynamics (Cambridge Univ. Press, 1999). 

40. Ryder, G. Mass flux in the ancient Earth-Moon system and benign implications for 

he origin of life on Earth. J. Geophys. Res. Planets 107, E45022 (2002). 

41. Norman, M. D. & Nemchin, A. A. A 4.2 billion year old impact basin on the Moon: 
U-Pb dating of zirconolite and apatite in lunar melt rock 67955. Earth Planet. Sci. 
Lett 388, 387-398 (2014). 


34. 


38. 


42. 
43. 
44. 
45. 
46. 
47. 


48. 


49. 
50. 
51. 
52. 
53. 
54. 


55. 


56. 
57. 
58. 


59. 


60. 


61. 


62. 


63. 


64. 
65. 


66. 


67. 


68. 


69. 


70. 


71. 


72. 


80. 


LETTER 


Leinhardt, Z. M. & Stewart, S. T. Collisions between gravity-dominated bodies. 

|. Outcome regimes and scaling laws. Astrophys. J. 745, 79 (2012). 

Tonks, W. B. & Melosh, H. J. Core formation by giant impacts. carus 100, 326-346 
(1992). 

Kleine, T., Kruijer, T. S. & Sprung, P. in Lunar and Planetary Science Conf. 45 2895 
(2014). 

O'Keefe, J. D. & Ahrens, T. J. Planetary cratering mechanics. J. Geophys. Res. 98 
(E9), 17011-17028 (1993). 
Holsapple, K. A. The scaling of impact processes in planetary sciences. Annu. Rev. 
Earth Planet. Sci. 21, 333-373 (1993). 
Wunnemann, K., Nowka, D., Collins, G. S., Elbeshausen, D. & Bierhaus, M. in 
Proceedings of 11th Hypervelocity Impact Symposium, 1-13 (2011). 
Elbeshausen, D., Winnemann, K. & Collins, G. S. Scaling of oblique impacts in 
frictional targets: implications for crater size and formation mechanisms. Icarus 
10.1016/j.icarus.2009.07.018 (2009). 
Grieve, R.A. F. & Cintala, M. J. An analysis of different impact melt-crater scaling 
and implications for the terrestrial impact record. Meteoritics 27, 526-538 (1992). 
Ahrens, T. J. & O’Keefe, J. D. in Impact and Explosion Cratering (eds Roddy, D. J., 
Pepin, R. O. & Merrill, R. B.) 639-656 (Pergamon, 1977). 

Bjorkman, M. D. & Holsapple, K. A. Velocity scaling impact melt volume. Int. J. 
Impact Eng. 5, 155-163 (1987). 

Grieve, R.A., Cintala, M.J. & Therriault, A. M. Large-scale Impacts and the Evolution of 
the Earth’s Crust: the Early Years (Geol. Soc. Am. Spec. Pap. 405, 2006). 

Pierazzo, E., Vickery, A.M. & Melosh, H. J. A reevaluation of impact melt production. 
Icarus 127, 408-423 (1997). 

Abramov, O., Wong, S. M. & Kring, D. A. Differential melt scaling for oblique impacts 
on terrestrial planets. /carus 218, 906-916 (2012). 

Ivanov, B.A. & Artemieva, N. A. in Catastrophic Events and Mass Extinctions: Impact 
and Beyond (eds Koeberl, C. & MacLeod, K.) 619-629 (Geol. Soc. Am. Spec. Pap. 
356, 2002). 

Pierazzo, E. & Melosh, H. J. Melt production in oblique impacts. /carus 145, 
252-261 (2000). 

Chapman, C. R. & McKinnon, W. B. in Satellites (eds Burns, J. A. & Matthews, M. S.) 
492-580 (Univ. Arizona Press, 1986). 

Thompson, S. L. & Lauson, H. S. /mprovements in the Chart D Radiation— 
Hydrodynamic Code 3: Revised Analytic Equation of State (Sandia Laboratories 
report SC-RR-71 0714, 1972). 

Benz, W., Cameron, A. G. W. & Melosh, H. J. The origin of the phase transition 
(g/cm?) Moon and the single impact hypothesis. Ill. Icarus 81, 113-131 (1989). 
Melosh, H. J. A hydrocode equation of state for SiOz. Meteorit. Planet. Sci. 42, 
2079-2098 (2007). 

Wiinnemann, K., Collins, G. S. & Osinski, G. R. Numerical modelling of impact melt 
production in porous rocks. Earth Planet. Sci. Lett. 269, 530-539 (2008). 

Collins, G.S., Melosh, H. J. & Ivanov, B. A. Modeling damage and deformation in 
impact simulations. Meteorit. Planet. Sci. 39, 217-231 (2004). 

Melosh, H. J. Acoustic fluidization: a new geologic process? J. Geophys. Res. 84, 
7513-7520 (1979). 

Bierhaus, M., Noack, L., Wuennemann, K. & Breuer, D. in Lunar and Planetary 
Science Conf. 44, 2420 (2013). 

Stéffler, D. Deformation and transformation of rock-forming minerals by natural 
and experimental shock processes. |. Behavior of minerals under shock 
compression. Fortschr. Mineral. 49, 50-113 (1972). 

Jones, A. P., Wunnemann, K. & Price, D. in Plates, Plumes, and Paradigms (eds 
Foulger, G. R., Natland, J. H., Presnall, D. C. & Anderson, D. L.) 711-720 (Geol. Soc. 
Am. Spec. Pap. 388, 2005). 

Huffman, A. R. & Reimold, W. U. Experimental constraints on shock-induced 
microstructures in naturally deformed silicates. Tectonophysics 256, 165-217 
(1996). 

Schmitt, R. T. Shock experiments with the H6 chondrite Kernouve: pressure 
calibration of microscopic shock effects. Meteorit. Planet. Sci. 35, 545-560 (2000). 
Ivanov, B. A. & Melosh, H. J. Impacts do not initiate volcanic eruptions: eruptions 
close to the crater. Geology 31, 869-872 (2003). 

Herzberg, C., Condie, K. & Korenaga, J. Thermal history of the Earth and its 
petrological expression. Earth Planet. Sci. Lett. 292, 79-88 (2010). 

McKinnon, W.B. & Schenk, P. M. Ejecta blanket scaling on the Moon and Mercury— 
inferences for projectile populations. Lunar Planet. Inst. Sci. Conf, Abstr. 16, 
544-545 (1985). 

Wielicki, M. M., Harrison, T. M. & Schmitt, A. K. Geochemical signatures and 
magmatic stability of terrestrial impact produced zircon. Earth Planet. Sci. Lett. 
321, 20-31 (2012). 


. Harrison, T. M. & Schmitt, A. K. High sensitivity mapping of Ti distributions in 


Hadean zircons. Earth Planet. Sci. Lett. 261, 9-19 (2007). 


. Nutman, A. P. Comment on ‘Zircon thermometer reveals minimum melting 


conditions on earliest Earth’ Il. Science 311, 779 (2006). 


. Turcotte, D.L.& Schubert, G. Geodynamics 2nd edn (Cambridge Univ. Press, 2002). 
. Rogers, A. D. & Nazarian, A. H. Evidence for Noachian flood volcanism in Noachis 


Terra, Mars, and the possible role of Hellas impact basin tectonics. J. Geophys. Res. 
Planets 118, 1094-1113 (2013). 


. Pearce, J. A. Geochemical fingerprinting of the Earth’s oldest rocks. Geology 42, 


175-176 (2014). 


. Zahnle, K. et al. Emergence of a habitable planet. Space Sci. Rev. 129, 35-78 


(2007). 


. Sleep, N. H. The Hadean-Archaean environment. Cold Spring Harb. Perspect. Biol. 


2,a002527 (2010). 
Arndt, N. T. & Nisbet, E. G. Processes on the young Earth and the habitats of early 
life. Annu. Rev. Earth Planet. Sci. 40, 521-549 (2012). 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


Frequency (arbitrary unit) 


10 100 1000 
Impactor size (km) 


Extended Data Figure 1 | Early Earth’s impactor size-frequency 
distributions. The red curve corresponds to current main-belt asteroids larger 
than 10 km. The largest object is Ceres, whose diameter is ~913 km. The black 
curve (vertically shifted for clarity) is a replicate of the main-belt curve, 
extrapolated to 4,000 km by using the slope in the size range 500-913 km. 
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Extended Data Figure 2 | Lunar impact fluxes. The differential number of 
lunar craters >20 km (N39) as a function of time and per unit surface for several 


scenarios discussed in the text. 
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Extended Data Figure 3 | Solidus and geotherm used in iSALE simulations. 
Note the temperature increase in the lithosphere that results in an increase in 
the temperature of buried surface material. Other processes resulting in an 
increase of the temperature of the buried crust are discussed in the text. The 
assumed thermal gradient is a lower limit (see Extended Data Table 1), 
implying that the increase in the temperature of the buried material can be 
significantly higher than is shown here. 
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Extended Data Figure 4 | Impact-generated melt volume. Left, comparison 
of melt volume production for various methods; right, comparison of melt 
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volume production for various impact velocities and mantle potential 
temperature (see the text for more details). 
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Extended Data Figure 5 | Melt spreading over the first 100 Myr of Earth that is unaffected by impacts before ~4 Gyr ago (see also Fig. 2b). Impacts 


history. Mollweide projections of the cumulative record of craters at four therefore set the stage for the environmental conditions on the Hadean Earth 
different times. There are portions of the Earth’s surface that are not affected by and have implications for the origin and development of life (see, for instance, 
impact-generated melt at each time step, except for the first 25 Myr (or discussion in refs 78-80). 


>4.475 Gyr). However, there is no significant fraction of the Earth’s surface 
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Extended Data Figure 6 | Minimum impact time for projectiles larger than _ to one impact. The median time is 4.32 Gyr ago, and the mean is 4.27 Gyr ago. 
500 km. Blue dots indicate the minimum impact time for impactorslargerthan The earliest evidence of life on Earth (~3.8 Gyr ago), and the start of the Late 
500 km recorded in 163 successful Monte Carlo simulations (see Fig. 1). The | Heavy bombardment (~4.15 Gyr ago; see the text) are also indicated. About 
vertical axis reports the number of impacts (in bins of 25 Myr each) normalized 10% of the simulations have a minimum time of 4 Gyr ago or less. 

by the number of simulations. The lowest y values shown in the plot correspond 
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Extended Data Table 1 | Various parameters used for iSALE simulations 


Model parameters 


Cells per projectile radius 50 
(CPPR) 
Gravity (m/s?) 9.81 
Crustal thickness (km) 30 
Mantle thickness (km) 2900 
Impact velocity (km/s) 8.5;12.7;17.0 
Impactor size (km) 1-4000 
Surface temperature (°C) 20 
Lithospheric temperature 11.25 
gradient (°C /km) 
Lithosphere thickness (km) 125 
Planet radius (km) 6371 
Core temperature (°C) 2727 
Material parameters 

granite dunite iron 
Melt temperature at zero pressure (°C) 1400 1100 1538 
Heat capacity (J/(kg*K)) 1000.0 1000.0 600.0 
Constant Simon approximation for solidus (Pa)* 6.00E+9 1.52E+9 6.0E+9 
Exponent in Simon approximation for solidus* 3.0 4.05 3.0 
Cohesion of damaged material (Pa) 1.0E+4 1.0E+4 1.0E+4 
Cohesion of intact material (Pa) 1.0E+7 1.0E+7 - 
Coefficient of friction for intact material 0.6 0.6 0.4 
Coefficient of friction for damaged material 2.0 1.2 - 


Note that our results on impact-generated melt for large impactors are fairly insensitive to the assumed lithospheric geotherm gradient because most of the melt is produced at depth. Here we adopted a 
conservative low value. For additional discussion on the effects of geotherm see ref. 69. 
* For the definition of these parameters, see ref. 69. 
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Seasonal not annual rainfall determines grassland 
biomass response to carbon dioxide 


Mark J. Hovenden!, Paul C. D. Newton? & Karen E. Wills! 


The rising atmospheric concentration of carbon dioxide (CO,) should 
stimulate ecosystem productivity, but to what extent is highly uncer- 
tain, particularly when combined with changing temperature and 
precipitation’. Ecosystem response to CO, is complicated by biogeo- 
chemical feedbacks” but must be understood if carbon storage and 
associated dampening of climate warming are to be predicted’. Feed- 
backs through the hydrological cycle are particularly important* and 
the physiology is well known; elevated CO, reduces stomatal conduc- 
tance and increases plant water use efficiency (the amount of water 
required to produce a unit of plant dry matter)’. The CO, response 
should consequently be strongest when water is limiting®; although 
this has been shown in some experiments’, it is absent from many*"’. 
Here we show that large annual variation in the stimulation of above- 
ground biomass by elevated CO in a mixed C;/C, temperate grass- 
land can be predicted accurately using seasonal rainfall totals; summer 
rainfall had a positive effect but autumn and spring rainfall had neg- 
ative effects on the CO, response. Thus, the elevated CO, effect mainly 
depended upon the balance between summer and autumn/spring 
rainfall. This is partly because high rainfall during cool, moist sea- 
sons leads to nitrogen limitation, reducing or even preventing bio- 
mass stimulation by elevated CO}. Importantly, the prediction held 
whether plots were warmed by 2°C or left unwarmed, and was sim- 
ilar for C; plants and total biomass, allowing us to make a powerful 
generalization about ecosystem responses to elevated CO2. This new 
insight is particularly valuable because climate projections predict 
large changes in the timing of rainfall, even where annual totals remain 
static’’. Our findings will help resolve apparent differences in the out- 
comes of CO) experiments and improve the formulation and inter- 
pretation of models that are insensitive to differences in the seasonal 
effects of rainfall on the CO, response”’*"*. 

Anthropogenic emissions of CO have been increasing, reaching 
9.7 billion tonnes of carbon (C) in 2012 (ref. 15). The biosphere absorbs 
approximately 50% of anthropogenically emitted carbon per annum", 
so the continued ability of the biosphere to capture and sequester atmo- 
spheric CO) is a critical determinant of the atmospheric CO, concentra- 
tion and thus of future climate. This is particularly true of the terrestrial 
biosphere where carbon sequestration seems to be increasing”®. A prin- 
cipal uncertainty in projecting future sequestration is the extent of the 
elevated CO, (eCO;) effect’’ and, in particular, how this might be mod- 
ified by potential changes in temperature and rainfall. 

High CO, concentrations affect plant productivity in two main ways. 
First, elevated CO, (eCO2) directly stimulates net carbon assimilation 
in C; plants by increasing carboxylation and suppressing photorespira- 
tion rates’. Second, eCO, lowers stomatal conductance thus reducing 
plant water use, leading to increased soil water retention’. These two 
physiological effects—increased photosynthesis and greater water use 
efficiency—lead to the widely held conclusion that the eCO} effect should 
be strongest when moisture is limited**”'*””. If correct, this would be 
an extremely valuable generalization, which would provide much greater 
confidence in model projections of future CO, responses. Some experi- 
mental evidence does support the generalization as there are examples 
where the eCO, response was strongest in dry years*’; unfortunately 


this not an invariable outcome and there are many studies that do not 
show a relationship between annual precipitation and the strength of 
the eCO, effect!?”°. 

Demand for water in grasslands varies seasonally”, meaning that the 
benefit of an increased water use efficiency at eCO would differ across 
seasons. In addition, the eCO, effect strengthens with increasing nitro- 
gen availability®’’’, and nitrogen availability is strongly affected by the 
seasonal distribution of rainfall”. It therefore seems possible, as suggested 
previously, that the eCO, effect might be related to seasonal rather than 
annual rainfall patterns. Here we test that possibility using data from 
a grassland exposed to elevated CO, and warming. 

In the TasFACE Global Change Impacts Experiment”, circular plots 
of a native grassland in southeastern Tasmania, Australia, were ex- 
posed to a factorial combination of eCO2 concentration (control and 
550 umol mol”) and temperature (unwarmed and warmed by 2 °C). 
We assessed annual biomass production by harvesting in late summer 
(approximately February) over a 9-year period from 2002 to 2010. 
Because this site experiences summer drought, we define the year as start- 
ing 1 March, the first day of the southern autumn. This also matches this 
ecosystem’s pattern of growth, which occurs mostly in spring and early 
summer. Rainfall during the experimental period was generally repre- 
sentative of the preceding century, although the period did contain the 
driest autumn and winters and the wettest spring on record (Extended 
Data Fig. 1). Because the strength of the eCO; effect varied from year to 
year, we tested several factors to gauge their influence on biomass stim- 
ulation by eCO. We used multimodel inference of multiple regression 
models to examine the importance of biomass in the control plots, bio- 
mass of the previous year, total annual rainfall, mean annual water avail- 
ability, seasonal rainfall and mean seasonal water availability on the eCO, 
effect. Because the eCO, effect is strongly controlled by soil nitrogen”, 
we also tested the impact of soil mineral nitrogen content during spring. 

Over the 9 years the eCO, effect averaged 13.9% but varied substan- 
tially among years (year X CO> Fg.64 = 3.67, P < 0.002; Fig. 1), ranging 
from a suppression of biomass by 41.5 + 5.9% (mean = s.e.m.) toa stim- 
ulation of 96.3 + 25.0% (mean + s.e.m.; Fig. 1). The warming treatment 
did not vary across years (year X warming Fg 64 = 1.47, P = 0.18) and 
importantly did not interact with the eCO, effect (CO. * warming 
F,,g = 0.01, P= 0.91) nor did year, warming and eCQ, interact (year 
xX CO, X warming Fs,¢4 = 0.80, P = 0.61). Therefore, warming by 2 °C 
did not alter the way that the eCO, effect varied across years. 

The eCOQ, effect was poorly related to mean annual soil water avail- 
ability (= 0.27, F,g = 2.60, P = 0.15) and annual rainfall (r= 0.42, 
F,g = 5.1, P > 0.05) but could be predicted accurately from seasonal rain- 
fall totals, with a model incorporating autumn, spring and summer rain- 
fall totals describing 91% of the year-to-year variation in the eCO, effect 
(7 = 0.91, F;,.5 = 16.5, P< 0.005). The predictive power of seasonal soil 
water availability (Fig. 1) was substantially lower than that of seasonal 
rainfall, suggesting that rainfall influences the eCO, effect in ways inde- 
pendent of its effect on soil water availability. Using weighted average 
values of the coefficients from all competitive models (Extended Data 
Table 1), the eCO effect was found to be directly proportional to summer 
rainfall (1.12 + 0.77% mm’ ' (mean + 95% confidence interval); Fig. 2) but 
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Figure 1 | Annual elevated CO, effect, seasonal soil water potential and 
rainfall in the TasFACE experiment. a, Average + s.e.m. (n = 6 replicate 
plots) percentage stimulation of total above-ground biomass by eCO, from 
2002 to 2010. The eCO, effect was calculated as the percentage difference 
between elevated and ambient CO, plots compared with biomass in the 
ambient plots, pooled across both levels of warming treatment. A significant 
CO) effect for a particular year (P < 0.05) as determined by repeated measures 
analysis of variance is indicated by an asterisk. The mean above-ground 
biomass of the ambient plots is indicated by the numbers above each column. 
b, Average + s.e.m. (n = 6 replicate plots) seasonal soil water potential in 
elevated (filled symbols) and ambient CO, plots (empty symbols). c, Seasonal 
rainfall totals for each year starting 1 March. A, autumn; W, winter; Sp, spring; 
S, summer. 


was reduced by increasing rainfall in both autumn (— 1.07 + 0.76% mm 


(mean + 95% confidence interval); Fig. 2) and spring (—0.48 + 0.35% mm ! 
(mean + 95% confidence interval); Fig. 2). Variation in winter rainfall 
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had no influence on interannual variation in the eCO, effect as none of 
the competitive models included winter rainfall. Additionally, models 
that incorporated annual rainfall or soil water availability as a term were 
less accurate than simpler models including only seasonal rainfall. None 
of the other covariates examined improved the explanatory power of the 
models. Hence, the eCO, effect, whether combined with warming or 
not, depended upon the balance between summer and autumn/spring 
rainfall and was independent of other measured factors that varied among 
the years. The increase in productivity caused by eCO was greatest in 
years when summer rainfall approached or exceeded the amount of 
spring and autumn rainfall (Fig. 3), and smallest or actually negative in 
years with abundant autumn/spring rainfall or little summer rainfall 
(Fig. 3). 

The study system is a C3-C, co-dominated grassland, so it is possible 
that the increased summer rainfall was only related to the eCO) effect 
on Cy grasses, which are more active in the warmer summer months. 
This grassland contains a single C4 grass species, which, although rel- 
atively abundant, was not always present in sufficient quantities to allow 
analysis. Therefore, we analysed the eCO, effect on C; biomass alone 
and found similar, but not identical, relationships with seasonal rain- 
fall totals to that occurring with total biomass, with autumn and spring 
rainfall suppressing and summer stimulating the eCO, effect (Fig. 2). 
Although the seasonal rainfall balance did not predict the eCO, effect 
on C; biomass as well as it did for total biomass, the effect was similar 
and the rainfall balance still explained 75% of the variation in response 
across the years (Extended Data Fig. 2). Hence, the stimulatory effect 
of summer rainfall and the inhibitory effect of autumn/spring rainfall 
on the eCO, effect applied to both C3-only and combined C; and C, 
vegetation, making these results widely relevant. 

We believe these seasonal effects arose partly through differences 
in plant water use under eCO, but also through an interaction between 
rainfall amount and nitrogen availability. Our data show soil mineral 
nitrogen content declined in spring as rainfall increased (Fig. 4) and, 
because this was not accompanied by an increase in the amount of nitro- 
gen located in plant biomass (Fig. 4), it is unlikely that the reduction was 
due to an increase in plant uptake but rather to leaching and/or denitri- 
fication; we have no data for autumn but assume similar processes 
would operate in this similar cool—wet season. As nitrogen availability 
declines it is well established that the eCO, effect also declines'’, mean- 
ing that in spring/autumn there is a negative feedback on the eCQ re- 
sponse. As increasing rainfall in these seasons also reduces the effects of 
greater water use efficiency, the combined effects are to minimize eCO, 
responses in spring and autumn (Figs 2 and 3). 
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Figure 2 | The influence of seasonal rainfall variation on the elevated CO, 
effect. a-f, Partial regression plots showing the influence in the multiple 
regression model attributable to autumn (a, d), spring (b, e) and summer 
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(c, f) on the eCO, effect of total above-ground biomass (a-c) and C; above- 
ground biomass (d-f). The solid line shows the modelled effect, with 95% 
confidence limits shown as dashed lines. 
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Figure 3 | The impact of seasonal rainfall balance on the elevated CO, effect. 
The mean (n = 6 replicate plots) annual eCO) effect on total above-ground 
biomass as a function of the seasonal rainfall balance, which is defined as the 
difference between summer rainfall and the sum of autumn and spring rainfall. 
Spring rainfall totals were halved in determining the seasonal rainfall balance as 
the multimodel estimates indicated that the effect of spring rainfall was 
approximately half that of the other seasons. Relatively more rainfall in summer 
gives a positive rainfall balance value, whereas a negative rainfall balance occurs 
when more rain falls in autumn and spring. The r* value was determined by 
linear regression (F,,7 = 66.0, P< 0.0001). 


The situation differs in summer because rainfall in this and many other 
systems tends to occur in isolated, heavy events. In this case, growth 
stimulation by eCO, will occur because water savings lengthen the grow- 
ing period from any single rainfall event. In between these summer down- 
pours, eCO, will have little effect because the supply of water generally 
limits growth”. An increase in the frequency of summer downpours 
increases the potential for increased growth and, therefore, the eCO, 
effect will increase with increasing summer rainfall. Because of these 
two opposing processes, stimulation of annual biomass by eCQ) will 
depend upon the relative proportions of annual rainfall that occurs in 
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Figure 4 | The impact of spring rainfall on biomass nitrogen and soil 
mineral nitrogen content. a, Average + s.e.m. (n = 6 replicate plots) biomass 
nitrogen (N) and b, soil mineral nitrogen concentration in ambient (open 
symbols) and elevated CO} plots (filled symbols) during late spring as a 
function of total spring rainfall from 2007 to 2010. Soil mineral nitrogen 
concentration (sum of ammonium nitrogen and nitrate nitrogen 
concentrations) was obtained from 2 M KCI soil extracts. 
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autumn/spring and summer, and very little upon the total amount that 
falls over the year. 

These results demonstrate that the pattern of rainfall during critical 
periods of the year has an overwhelming influence on the eCO, effect. 
Importantly, the actual impact of increasing rainfall on biomass stimu- 
lation by eCO, differs depending upon season; this might help explain 
the different patterns between annual rainfall and the eCO, effect in 
different experiments and among years in individual experiments. The 
results force us to reconsider the general view that the eCO, response 
of ecosystems will be strongest in drier conditions; this is a widespread 
view that has emerged from syntheses of experiments”*”’, has been the 
outcome of modelling’*”* and has consequently found its way into impact 
assessments”””°. The interaction of water and nitrogen evident in our 
results has been identified as important previously*”’ but the flow-on 
consequences across seasons leading to diametrically opposite eCO, 
responses to rainfall is a finding that extends this understanding and 
suggests a review of models is necessary. Our data provide further ex- 
perimental evidence that all global biogeochemical models used to model 
land-atmosphere interactions, particularly fluxes of carbon, need to in- 
clude nutrients if they are to project fluxes accurately>"*. 


METHODS SUMMARY 


The experiment was located in a species-rich native grassland dominated by C; grasses 
from the genus Austrodanthonia and the C4 grass Themeda triandra. Full details 
of the experiment are available elsewhere~*. Every year above-ground biomass was 
sampled by clipping at the end of summer in one random 20 cm X 20cm quadrat 
in each of three randomly chosen quadrants of each circular plot. Above-ground 
material was separated to species, dried to constant weight at 60 °C and weighed. 
In October (mid-spring) 2007 to 2010, six soil cores (3.5 cm diameter, 10 cm depth) 
were collected from each plot. Soil mineral nitrogen content was assessed on 2M 
KCI soil extracts as the sum of ammonium nitrogen and nitrate nitrogen. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

Experiment. The experiment was located in species-rich native temperate grass- 
land in southeastern Tasmania, Australia (42°42' S 147°16’ E). The region has a 
modified Mediterranean climate characterized by mild moist winters and warm dry 
summers. The vegetation was dominated by the perennial grasses Austrodanthonia 
caespitosa, A. carphoides and T. triandra (the only C, species) although almost one- 
third of the recorded species were native perennial forbs. The C, grass T. triandra 
contributes approximately 30% of the biomass in the grassland, although this varies 
substantially in space and from year to year. The experiment consisted of 12 free-air 
carbon dioxide enrichment (FACE) rings of 1.5 m diameter, in which vegetation was 
exposed either to ambient or elevated CO , and were either warmed or unwarmed. 
Thus, the experiment was a factorial 2 x 2 design with three replicate plots of each 
CO, X warming combination. FACE rings were fumigated to 550 pmol mol‘ by 
the FACE method, using a modified pure-CO) injection system. Warming was 
provided by the addition of 140 W m ” of infrared radiation using 240 V 250 W 
Emerson Solid Ceramic Infrared Emitters suspended 1.2 m above the soil surface 
and above the centre of each ring. The infrared emitters operated continuously and 
provided an average warming of canopy temperature of 1.98 °C and of soil tem- 
perature at 1 cm depth of 0.82 °C over the year. Full experimental details are avail- 
able elsewhere™*. Given the constraints of a manipulative experiment such as a 
FACE experiment, we calculated the probable effect size that we could detect given 
the number of experimental replicates and the background level of variation in 
biomass in this grassland. When a significant interaction between the two factors 
was not present, as was the case here, we calculated that we could detect a 30% 
change in site productivity. 

Sampling and analyses. Every year, above-ground biomass was sampled by clip- 
ping to 2mm above the ground surface at the end of summer (in February of the 
following calendar year) in one 20 cm X 20cm quadrat randomly located in each 
of three randomly chosen quadrants of each circular plot. Above-ground material 
was separated to species, dried to constant weight at 60 °C and weighed. In October 
(mid-spring) 2007 to 2010, three soil cores (3.5 cm diameter, 10cm depth) were 
collected from under three randomly chosen patches of each of the C3- and Cy- 
dominated vegetation in each plot, giving six cores per plot. Soil cores were com- 
posited according to species in the field, returned to the laboratory upon ice and 
stored at 4°C for a maximum of 2 days before being extracted. Soil ammonium 
nitrogen and nitrate nitrogen were determined for 2 M KC] extracts using proto- 
cols already described*". 

Rainfall was measured using a Campbell Scientific tipping bucket rain gauge 
(0.2 mm event size) connected to a CR-10X data logger. Seasons were defined as 
3-month periods starting on the first day of the month, with autumn starting on 
1 March. 


LETTER 


Statistical analyses. Biomass data were analysed using a repeated-measures two- 
factor analysis of variance (ANOVA) in R”, with CO, and warming as fixed factors, 
which was appropriate given the two-factor orthogonal design of the experiment. 
Biomass data were tested for normality and heteroscedasticity using box plots, resid- 
ual plots and Cochran’s test when these indicated heteroscedasticity was likely”’. 
Consequently, data were transformed to their natural logarithm before ANOVA, 
which provided a homoscedastic, normally distributed data set. As the analysis 
indicated that there were no CO, X warming or year X CO, X warming interac- 
tions, the CO, fertilization effect was determined on samples pooled across warm- 
ing treatments. Thus, the percentage stimulation of biomass by eCO, was calculated 
as the difference in biomass between eCO, and ambient plots in proportion to the 
biomass of ambient plots, ignoring the warming treatment. Relationships between 
seasonal rainfall totals and the CO, fertilization effect were determined by multiple 
regression analyses using R. Potential collinearity between seasonal rainfall totals 
was examined by simple regressions and scatter plots**. There was no evidence of 
any collinearity between seasonal rainfall totals, so all seasons were retained in the 
multiple regression analyses. Relationships between the CO; fertilization effect and 
seasonal rainfall totals were determined using multimodel inference procedures™* 
with the MuMIn package in R. Beginning with all possible combinations of seasonal 
rainfall totals, we ranked the resultant models using the Akaike information criterion 
corrected for finite sample size (AIC.). Model competitiveness was determined by 
observation of the difference in AIC, between each model and the lowest value of 
AIC, obtained (AAIC.). Models were ranked in ascending AAIC, value and a dis- 
tinction between competitive and non-competitive models was made by observ- 
ing any obvious breaks in the sequence of ascending AAIC,. By this method 
models with AAIC, < 6 were deemed competitive and those with AAIC, > 10 non- 
competitive. Model term coefficients were determined by calculating weighted aver- 
age values over all competitive models” and 95% confidence limits calculated. As 
the winter rainfall term was not included in any of the competitive models, it was 
omitted from further analysis. We used partial regression analysis to determine 
the effects, with 95% confidence limits, of autumn, spring and summer rainfall totals 
on the annual CO; fertilization effect for total above-ground biomass as well as for 
that of C; species alone using the effects package in R. 


31. Osanai, Y. et a/. Decomposition and nitrogen transformation rates in a temperate 
grassland vary among co-occurring plant species. Plant Soil 350, 365-378 
(2012). 

32. R Development Core Team. R: A Language and Environment for Statistical 
Computing (R Development Core Team, 2011). 

33. Quinn, G. P. & Keough, M. J. Experimental Design and Data Analysis for Biologists 
(Cambridge Univ. Press, 2002). 

34. Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference: A 
Practical Information-Theoretic Approach 2nd edn (Springer, 2002). 
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Extended Data Figure 2 | The impact of seasonal rainfall balance on the balance as the multimodel estimates indicated that the effect of spring rainfall 
elevated CO, effect on biomass of C; vegetation only. The mean (n = 6 was approximately half that of the other seasons. Relatively more rainfall in 
replicate plots) annual elevated CO) effect on above-ground biomass of C3 summer gives a positive rainfall balance value, whereas a negative rainfall 
plants only as a function of the seasonal rainfall balance, which is defined as _ balance occurs when more rain falls in autumn and spring. The solid line and 
the difference between summer rainfall and the sum of autumn and spring associated 7° value are the result of a linear regression analysis (F,,s = 18.1, 


rainfall. Spring rainfall totals were halved in determining the seasonal rainfall P< 0.006). 
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Extended Data Table 1 | Model coefficients and performance for CO>2 effect on above-ground biomass 


Model # Autumn rainfall Spring rainfall Summer rainfall df AIC, AAIC, Weight 
1 -1.16 -0.49 1.19 5 102.2 a 0.68 
2 -0.90 - 1.00 4 1044 2.11 0.23 
3 - -0.32 0.58 4 107.8 5.54 0.04 
4 -0.54 -0.37 - 4 108.1 5.85 0.04 
Models are ranked in order of the corrected Akaike information criterion (AIC,). df, degrees of freedom. AAIC,, the difference in AIC. from the best model. Weight, Akaike weight, which corresponds to the probability 


that the model is the best of all models tested. 
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PTEX is an essential nexus for protein export in 


malaria parasites 
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During the blood stages of malaria, several hundred parasite-encoded 
proteins are exported beyond the double-membrane barrier that sep- 
arates the parasite from the host cell cytosol’*. These proteins have 
a variety of roles that are essential to virulence or parasite growth’. 
There is keen interest in understanding how proteins are exported 
and whether common machineries are involved in trafficking the dif- 
ferent classes of exported proteins*’. One potential trafficking machine 
is a protein complex known as the Plasmodium translocon of exported 
proteins (PTEX)’®. Although PTEX has been linked to the export of 
one class of exported proteins'®"’, there has been no direct evidence 
for its role and scope in protein translocation. Here we show, through 
the generation of two parasite lines defective for essential PTEX com- 
ponents (HSP101 or PTEX150), and analysis of a line lacking the 
non-essential component TRX2 (ref. 12), greatly reduced traffick- 
ing of all classes of exported proteins beyond the double membrane 
barrier enveloping the parasite. This includes proteins containing 
the PEXEL motif (RxLxE/Q/D)'” and PEXEL-negative exported pro- 
teins (PNEPs)°. Moreover, the export of proteins destined for expres- 
sion on the infected erythrocyte surface, including the major virulence 
factor PfEMP1 in Plasmodium falciparum, was significantly reduced 
in PTEX knockdown parasites. PTEX function was also essential for 
blood-stage growth, because even a modest knockdown of PTEX com- 
ponents had a strong effect on the parasite’s capacity to complete the 
erythrocytic cycle both in vitro and in vivo. Hence, as the only known 
nexus for protein export in Plasmodium parasites, and an essential 
enzymic machine, PTEX is a prime drug target. 

To address the role of PTEX in protein export directly, we examined 
parasite lines defective in PTEX components for their capacity to trans- 
locate exported proteins. Two PTEX components, TRX2 and PTEX88 
(Fig. 1a), have auxiliary roles in PTEX function, because their deletion 
results in a substantial parasite growth defect'*’’. We therefore assumed 
that PTEX function is suboptimal in these lines. Here we show that sur- 
face expression of parasite antigens was substantially reduced in Plasmo- 
dium berghei TRX2-deficient parasites’? (Extended Data Fig. 1), which 
is consistent with a role for PTEX in protein export. To perturb PTEX 
function more fully, conditional mutants of two essential PTEX com- 
ponents, HSP101 and PTEX150 (refs 10, 12, 13), were also generated. 
These proteins are synthesized in late schizogony and early ring stage 
and reside in the parasitophorous vacuole membrane for the remainder 
of the erythrocytic cycle™. 

For HSP101, we generated a P. berghei line, Pbil01 KD, harbouring 
HSP101 under the transcriptional control of an anhydrotetracycline (ATc)- 
regulated transactivator element’* (Fig. 1b, c and Extended Data Fig. 2a). 
The growth of Pbil01 KD parasites was specifically sensitive to treatment 
with ATc (Fig. 1d, e). The Pbil01 KD line grew poorly in mice pre-exposed. 
to ATc 24h before infection (Fig. 1d, top panel, and Extended Data Fig. 2b), 
and normal growth of Pbil01 KD in the absence of ATc could be reversed 


if ATc was added at day 4 (Fig. 1d, middle panel, and Extended Data 
Fig. 2b). As expected", the growth of parental P. berghei ANKA para- 
sites was unaffected by the presence of ATc (Fig. 1d, bottom panel). 

To examine the growth effect in more detail, purified ring-stage Pbil01 
KD parasites were injected into mice pre-exposed to ATc, then isolated 
29 h later and cultured in vitro with ATc (Fig. le). As expected, parasites 
invaded erythrocytes in the mice and developed normally into ring stages 
(Fig. le, 24h time point). However, parasites appeared morphologically 
abnormal by the 34h time point and were incapable of developing into 
schizonts by the 46 h time point, unlike Pbil01 KD parasites not exposed 
to ATc. Asynchronous Pbil01 KD ring-stage parasites cultured in vitro 
for 16 hin the presence of ATc demonstrated a threefold to sixfold decrease 
in hsp101 messenger RNA in schizont stages (Fig. 1f) and a 85-90% 
knockdown of HSP101 protein by the 29 h time point in Fig. le relative 
to the loading control proteins EXP2 and MSP8 (Fig. 1g). 

To examine whether HSP101 knockdown affected protein export, we 
assessed whether asynchronous Pbil01 KD parasites harvested from ATc- 
pretreated mice at the time point indicated by the grey bar in Fig. 1d dis- 
played surface-expressed antigens. Pbil01 KD parasites showed a strong 
reduction in parasite-encoded surface antigens compared with para- 
sites grown in the absence of ATc (Fig. 2a). In an alternative approach, 
Pbil01 KD parasites grown to a higher parasitaemia (~ 10%) in the absence 
of ATc were subsequently treated with ATc for either 12 or 24h before 
analysis. Given the asynchronicity of the infection, some parasites would 
already have transcribed HSP101 before ATc treatment commenced; 
consistent with this, surface expression was reduced in a manner depen- 
dent on the duration of ATc exposure (Fig. 2b). 

We also assessed the export of individual proteins by immunofluo- 
rescence assay (IFA) using specific antibody reagents against three dif- 
ferent exported proteins. Two of these proteins, PPANKA_ 114540 and 
PbANKA_ 122900, contain the PEXEL motif (RxLxE/Q/D) and localize 
to punctate structures in the erythrocyte cytosol’® (C. K. Moreira, B. 
Naissant, A. Coppi, L. Bennet, E. Aime, B. Franke-Fayard, C. J. Janse, 
I. Coppens, P. Sinnis and T. J. Templeton, personal communication), 
whereas PbANKA_083680 (EMAP1) is a PNEP that localizes to the eryth- 
rocyte membrane”. In each case, a striking blockage of protein export 
was observed in Pbil01 KD parasites exposed to a variety of different 
ATc treatment regimes; this included Pbil01 KD harvested from mice 
at the time points represented by the grey bar and asterisks in Fig. 1d 
(Fig. 2c and Extended Data Fig. 3a, respectively) and in Fig. le (Fig. 2c 
and Extended Data Fig. 3a). For example, in morphologically normal 
ring-stage parasites examined at the 24 and 29 h time points in Fig. le, 
observable export of PPANKA_114540 and P»bANKA_083680 could 
only be detected in 7 out of 131 and 5 out of 100 parasites, respectively, 
whereas none of 100 parasites visibly exported PpANKA_ 122900. In 
contrast, almost equivalent numbers of Pbil01 KD parasites grown with 
or without ATc (80% and 88% expression, respectively) expressed MSP8, 


1Macfarlane Burnet Institute for Medical Research and Public Health, Melbourne, 3004, Australia. Monash University, Clayton, Victoria, 3800, Australia. 3Deakin University, Waurn Ponds, 3216, Australia. 
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Figure 1 | Inducible knockdown of P. berghei HSP101 (i101 KD). 

a, Diagram of a parasite-infected erythrocyte (RBC), the location of PTEX and 
proteins investigated in this study. ER, endoplasmic reticulum; GAPDH, 
glyceraldehyde-3-phosphate dehydrogenase. b, c, PCR (b) and Southern blot 
(c) of Pbil01 KD (I) and PbANKA wild-type (WT) parasites. kb, kilobases; 
INT, integration. d, Representative experiments (n = 3) showing that growth of 
Pbil01 KD in vivo is affected by ATc. Error bars show s.e.m. for three mice per 


a parasite-membrane protein known to be strictly synthesized in the 
ring stage’*"? (Fig. 2d). The localization of additional control proteins, 
EXP2 and the apicoplast-resident protein (ACP), were also unaffected 
by HSP101 knockdown (Extended Data Fig. 3b). As expected, export 
was unaffected in wild-type parasites treated with ATc (Extended Data 
Fig. 3a). In summary, knockdown of HSP101 in P. berghei induced a 
profound defect in the capacity ofboth PEXEL-containing proteins and 
PNEPs to enter the cytosol of the infected erythrocyte, with a conse- 
quential detrimental effect on parasite growth, thereby highlighting an 
essential function for PTEX. 

In a parallel approach, an inducible ribozyme system” was used to 
generate a conditional PTEX150 knockdown in the human malaria para- 
site P. falciparum. CS2 was used as the parental parasite strain because this 
line expresses a stable and well-characterized PfEMP1 phenotype’. The 
gene encoding PTEX150 was modified to incorporate the glucosamine- 
inducible glmS ribozyme within its 3’ untranslated region to generate a 
line termed PTEX150-HAglmS (Extended Data Fig. 4). A control line, 
PTEX150-HA, identical to PTEX150-HAglmS except for the absence 
of the glmS ribozyme, was also generated. 

PTEX150-HAglmS parasites exposed to glucosamine at the tropho- 
zoite stage (Fig. 3a, b, day 0) remained capable of invading erythrocytes 
normally (day 1). However, in contrast to the control line, they could 
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condition, performed in parallel. e, Giemsa-stained blood smears from 
representative experiments (n = 3), showing parasites treated with ATc for the 
indicated period fail to recover in vitro. For d and e, grey bars and asterisks 
indicate when export was analysed. f, Downregulation of hsp101 transcript in 
Pbil01 KD parasites exposed to ATc. gDNA, genomic DNA. g, Western blot 
analysis showing a more than 85% decrease in HSP101 expression in parasites 
exposed to ATc (n = 2). 


not advance beyond the early trophozoite stage and hence could not 
progress to the next parasite cycle when glucosamine concentrations of 
0.6 mM or more were used (day 3) (Fig. 3b and Extended data Fig. 5). 
At glucosamine concentrations less than this, DNA replication and 
growth progressed but were slightly lower than in the control line, ina 
dose-dependent manner. 

When glucosamine was added at the trophozoite stage, PTEX150 
protein levels were reduced in PTEX150-HAglmS ring-stage parasites 
in a glucosamine dose-dependent manner, with more than 50% knock- 
down at glucosamine concentrations above 0.3 mM (Fig. 3c and Extended 
Data Fig. 6a). In contrast, PTEX150 levels were unaffected in PTEX150- 
HA control parasites exposed to glucosamine. The smaller but reprodu- 
cible glmS-specific decrease observed for the control protein endoplasmic 
reticulum-resident calcium-binding protein (ERC) could be explained 
by the strong PTEX150 knockdown-specific growth effect, which we 
expected would reduce the expression of ERC and other control proteins 
(Extended Data Fig. 6a, b). Together these data indicated that PTEX150 
levels were specifically reduced by the addition of glucosamine to PTEX150- 
HAglmS parasites, and that blood-stage development was exquisitely 
sensitive to PTEX150 levels, with 50% or more knockdown effectively 
ablating development to the mature trophozoite stage in the parasite 
cycle after glucosamine treatment. 
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With this knockdown approach we used quantitative IFA to examine 
the export of four different exported proteins: an early expressed PEXEL 
protein, ring-infected erythrocyte surface antigen (RESA)”; a ‘soluble’ 
PEXEL-containing protein, Pl. falciparum knob-associated histidine-rich 
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Figure 2 | Knockdown of P. berghei HSP101 blocks export of PEXEL and 
PNEP proteins. a, Surface labelling of parasite antigens on Pbil01 KD 
parasites harvested between days 1 and 2 post infection from mice pretreated 
with ATc was substantially decreased compared with infected erythrocytes 
not exposed to ATc as measured by FACS (n = 8; error bars represent s.e.m.; 
***P < 0.001, using unpaired t-test). Boxes and whiskers delineate all data 
points, with whiskers indicating minimum and maximum values. b, Surface 
labelling of parasite antigens on asynchronous Pbil01 KD parasites grown to 
high parasitaemia and then treated with ATc for either 12 or 24h (n = 8; 
error bars represent s.e.m.). c, Representative IFA of 100 Pbil01 KD 
intraerythrocytic stages, showing that exposure to ATc blocks export of PEXEL 
(top and middle panels) and PNEP (bottom panel) proteins. Yellow bar in 
all diagrams, signal sequence; red bar, transmembrane domain; black bar, 
glycosylphosphatidylinositol anchor. DIC, differential interference contrast. 
d, Expression of MSP8 is not affected by ATc. Scale bars, 5 um. 


protein (KAHRP)”; a double transmembrane and PEXEL-containing 
protein, Hyp8 (refs 3, 24); and a PNEP, skeleton binding protein 1 (SBP1)”. 
Both Hyp8 and SBP] localize to Maurer’s clefts, membranous structures 
that reside in the erythrocyte cytosol. RESA was a significant inclusion 
because it is exported in very early ring stages, within about 1h after 
invasion” and before the growth inhibitory effect of glucosamine, con- 
trolling for any non-specific effect of PTEX150 knockdown on export. 

We used two concentrations of glucosamine: the sublethal level of 
0.15 mM, to minimize any indirect effect of growth arrest on export, 
and a high dose of 2.5 mM. Because both RESA and KAHRP are found 
throughout the cytosol during the ring stages, we quantified their export 
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Figure 3 | Generation of a PTEX150 knockdown line in P. falciparum. 

a, PTEX150-HAglmS parasites (left), but not the control PTEX150-HA 
parasites (right), fail to proliferate when treated with glucosamine (GlcN) at a 
concentration of 0.6 mM or higher in the previous cycle (n = 2). b, Giemsa- 
stained P. falciparum cells, showing arrest of growth in the pigmented 
trophozoite stage (21-33 h post invasion (hpi)) in 2.5 mM GlcN added to the 
previous cycle. c, Western blot analysis: similarly treated PTEX150-HAglmS 
parasites, but not the control PTEX150-HA parasites, show an up to 80% 
decrease in PTEX150 protein levels (n = 2). HA, haemagglutinin epitope tag. 
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by measuring the mean fluorescence intensity in the erythrocyte cyto- 
sol at different times after invasion, excluding the region occupied by the 
parasite (as denoted by staining with 4’,6-diamidino-2-phenylindole 
(DAPI) and staining for EXP2). Using this approach, a strong block- 
age in export was seen at both 2.5 mM and 0.15 mM glucosamine in 
PTEX150-HAglmS parasites (Fig. 4a, b and Extended Data Fig. 7). This 
defect was not seen without glucosamine or in the control PTEX150- 
HA line (Fig. 4a, b). For SBP1 and Hyp8, Maurer’s clefts in the erythro- 
cyte cytosol were counted using an automated quantitative microscopic 
approach. Again, specific knockdown of PTEX150 led to a strong defect 
in export of these proteins (Fig. 4c, d and Extended Data Figs 8 and 9), 
both of which seemed to be blocked at the parasitophorous vacuole. To 
further control for non-specific effects on vesicular trafficking, we exam- 
ined the localization of MSP8 under PTEX150 knockdown conditions. 
No effect on either MSP8 expression or localization to the parasite mem- 
brane was observed (Fig. 4e). 

Finally, we used pooled VAR2CSA reactive immune serum and found 
that surface PfEMP1 expression was markedly reduced as a result of 
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Figure 4 | PTEX150 knockdown blocks protein export in P. falciparum. 
a-e, IFAs (right) and graphs (left) showing a decrease in the export of RESA 
(a) and KAHRP (b) (mean fluorescence intensity, MFI) and of SBP1 (c) 

and Hyp8 (d) (Maurer’s clefts, MCs) (n = 12-47 cells for each antibody or 
GlcN concentration) but similar levels of MSP8 (e) (n = 15-35) after treatment 
with GlcN. Boxes and whiskers delineate 25th-75th and 10th-90th centiles, 
respectively. Colours of bars in diagrams as in Fig. 2. Scale bars, 5 tum. f, Flow 
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PTEX150-specific knockdown, even at relatively low levels of gluco- 
samine (Fig. 4f). Consistent with this, we also demonstrated a smaller 
proportion of total PEMP1 expressed on the surface of PTEX150 knock- 
down parasites by using a cytoadherance assay for chondroitin sulphate 
A and a trypsin sensitivity assay (Fig. 4g and Extended Data Fig. 6c). 
Although PfEMP1 requires many PEXEL and PNEP proteins for its traf- 
ficking to the infected erythrocyte surface’”’, the fact that under knock- 
down conditions it remained trapped in the parasite and was not present 
in the host cell cytosol or at the Maurer’s clefts is consistent with the 
direct trafficking of PIEMP1 by PTEX (Extended Data Fig. 6d). 
Thus, using a number of different approaches to specifically knock 
down the expression of PTEX components we have shown that all par- 
asite protein classes destined for the erythrocyte cytosol were prevented 
from crossing the parasitophorous vacuole membrane. This is direct 
evidence that the role of the PTEX molecular machine is to translocate 
proteins across this membrane (summarized in Fig. 1a). As a nexus for 
all proteins destined for the host cell cytosol, and directly or indirectly 
for parasite proteins expressed on the infected erythrocyte surface, this 
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cytometry analysis showing decreased export of VAR2CSA onto the 
erythrocyte surface after treatment with GlcN (n = 3), at 24-28 h after invasion 
(top) and at 32-36 h after invasion (bottom). g, Cytoadherence of PTEX150- 
HAglms to chondroitin sulphate A (n = 2). Bars represent means + s.d. 

*P < 0.05; **P < 0.01; ***P < 0.001 as determined by unpaired t-test, with the 
exception of cytoadherence studies, in which a Mann-Whitney test was used. 
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complex becomes an attractive target for drug development. Consis- 
tent with its potential as a potent drug target, we show here that even 
modest knockdown of PTEX components had a strong inhibitory effect 
on Plasmodium growth in vitro. In vivo, the inhibitory effect is likely to 
be even stronger because the processes involved in pathogenesis, such 
as those designed to avoid splenic clearance’”*”, will also be disrupted 
by compounds inhibiting PTEX function. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Infection of mice with P. berghei parasites. Female Balb/c mice (6-8 weeks of 
age) were randomized into groups and infected intraperitoneally with 10’ parasitized 
erythrocytes. Parasitaemias were determined by Giemsa-stained blood smears; at 
least 1,000 erythrocytes were counted. All experiments involving mice were performed 
in strict accordance with the recommendations of the Australian Government and 
the NHMRC Australian code of practice for the care and use of animals for scien- 
tific purposes. Protocols were approved by the Deakin University Animal Welfare 
Committee (approval no. AWC A97/10). 

Plasmid constructs. The construct pTg-ranTRAD4-iHSP101 was used to generate 
the P. berghei i101 KD. This plasmid was based on pPRF-TRAD4-Tet07-HAPRF- 
hDHER’ but modified to include a BsiWI restriction site downstream of the pro- 
filin coding sequence and a BssHII restriction enzyme site between the profilin 5’ 
untranslated region and the TRAD4 sequence. This enabled cloning of the first 
1.7 kilobases (kb) of the HSP101 coding sequence (PbANKA_09312; amplified with 
DO390F, 5’-caccetgcagATGGTACGGAACATTGCTAAAAATT-3’, and DO414R, 
5'-gtategtacgccatggCTATAACTCTTGGTTTACCCG-3’, the latter containing an 
internal Ncol site) into the PstI and BsiWI cloning sites and 0.85 kb of the HSP101 
5’ untranslated region (amplified using oligonucleotides DO392F, 5'-gtaccatggC 
GTACGGTATGCAATTGCTCTTAATGCATTTGC-3’, and DO394R, 5’ -tatgegege 
TTTCTACTAAATTTATAGTAAATATAGATATA-3’) into the Ncol and BssHII 
sites. Before transfection, DNA was linearized with Ncol. 

pPTEX150-HA-glmS was produced by excision of the strep-tag from pPTEX150- 
HA/Str3’ (ref. 10) and the introduction ofa stop codon followed by a 3’ glmS sequence. 
Transfection. The reference clone15cy1 from the P. berghei ANKA strain was used 
to generate P. berghei transgenic parasites. Transfection of schizont-stage parasites 
with 1 1g of linearized DNA construct was performed with the Nucleofector elec- 
troporation device (Amaxa), using previously described protocols” stable transfec- 
tants were selected by adding pyrimethamine at a final concentration of 0.07 mg ml 
to the drinking water of mice. For the Pb101 KD parasites, PCR and Southern blot 
analysis (performed as outlined below) of the pyrimethamine-resistant population 
obtained after transfection revealed that the parasites were already clonal. Never- 
theless, limiting-dilution cloning was performed on this population, and both the 
original homogeneous population and cloned line were used for phenotypic analysis. 

For P. falciparum, 100 ug of pPTEX150-HA and pTEX150-HAglmS were used 
to transfect CS2 parasites that had recently been selected for chondroitin sulphate 
A (CSA) binding”. Transfected parasites were selected with 2.5 nM WR99210 and 
were cycled on and off the drug to select for integration into the ptex150 locus. 
Nucleic acid analysis. The genotype of the P. berghei i101 KD was confirmed by 
Southern blot analysis of genomic DNA isolated from infected rodent blood. Nucleic 
acid probes were synthesized using the DIG PCR Probe Synthesis kit (Roche); 
detection was performed with the DIG Luminescent Detection kit (Roche) in accor- 
dance with the manufacturer’s protocol. PCR was also used to confirm integration 
at the 5’ and 3’ ends and the purity of the population using a combination of the 
following oligonucleotides: a, 5’-TTATAGTTTAGAACACCAAGGACG-3’; b, 
5'-GCCTTCGATACCGACTTCATTGAG-3’; c, 5’-CTTTCGATACCGTCGAC 
CTCGAG-3’; d, 5'-TTTTGCTTAATGGCTCGAAAA-3’. To detect PTEX tran- 
scripts in P. berghei ANKA parasites by RT-PCR, RNA was extracted from blood- 
stage parasites, using the NucleoSpin RNA II Kit (Macherey—Nagel) followed by 
treatment with DNasel (Invitrogen). cDNA was then made using the Omniscript 
RT Kit (Qiagen) in accordance with the manufacturer’s manual. cDNA (or genomic 
DNA as a control) was used in PCR reactions using the oligonucleotides to detect 
hsp101 (DO390, 5'-ATGGTACGGAACATTGCTAAAAATT-3’; DO391, 5’-CC 
AAATTGTTCAATGTTTAATCCAG-3’), rap1 (DO186, 5'-GATTATTCTGTG 
GCATTTAACAT-3’; DO187, 5’-GAAGGTAATCATTTTTTGTGG-3’) and ptex150 
(MK28, 5’-AATGACCAGCCAATTGTTCC-3’; MK29, 5'-TGCATCTTTGCCT 
TCTTCCT-3’). 

Correct integration of the plasmids into the ptex150 locus (P£3D7_1436300) was 

confirmed by PCR on DNA template isolated from the parasite lines using primers 
A(5'- CGTTGTAAATTCTAAATATGCTGATAATTCC-3’), B (5’-TTCTTTTA 
ATTTTTTTTCTTTAGCTCTCCATTGT-3’) and C (5'-CCGGGACGTCGTAC 
GGGTATGCTG-3’). 
Treatment with ATc. For the in vivo exposure of parasites to ATc, mice were ran- 
domized into groups of three for each experiment and then given drinking water 
containing 0.2 mg ml! ATc (Sigma) made in 5% sucrose. ATc was either admini- 
stered to mice from 24h before infection or, in other experiments where indicated, 
parasites were grown to a higher parasitaemia (~5-10%) in mice in the absence of 
ATc, and then exposed to ATc for designated durations. A minimum of 1,000 eryth- 
rocytes were counted to determine the parasitaemia. For in vitro treatment, para- 
sites harvested from mice at ring stage were grown until schizont stage at 36.5 °C 
(~16h) in RPMI 1640 medium containing L-glutamine (Life Technologies) sup- 
plemented with 25 mM HEPES, 0.2% bicarbonate, 25% fetal bovine serum and 
lug ml! ATc (or vehicle as a control). 


Treatment with glucosamine. P. falciparum CS2 PTEX150-HA and CS2 PTEX150- 
HAglmS parasites were synchronized with 5% sorbitol; at 24h after invasion, 1 M 
glucosamine (Sigma) was added to various final concentrations (0.075-2.5 mM) as 
well as a 0 mM glucosamine control. For microscopy, later in the same cell cycle, 
heparin sulphate (Sigma) was added to prevent new invasions. The heparin was 
washed out and added back 3 h later to give the parasites a 3 h invasion window. At 
times corresponding to 4-7, 8-11, 12-15, 16-19 and 20-23 h after invasion, thin 
blood smears were made from all cultures and smears were allowed to dry in air. 
Slides were stored at —20 °C until needed. To make highly synchronous young ring- 
stage parasites, merozoites were prepared as described” and allowed to invade for 
10 min before heparin sulphate was added to inhibit new invasions. At times cor- 
responding to 1 and 3 h after invasion, thin blood smears were made from all cultures 
and treatments and were allowed to dry in air before being stored frozen until needed. 
Western blot analysis. P. berghei ring-stage parasites administered to mice pre- 
exposed to ATc or vehicle control were harvested 29 h later and lysed with 0.09% 
saponin. Equal volumes of parasite material were fractionated in 8% acrylamide Bis- 
Tris gels and blotted onto 0.45 um poly(vinylidene difluoride) membrane (Millipore). 
The membranes were blocked in 3% BSA in PBS and probed with rabbit anti-HSP101 
(1:200 dilution), rabbit anti-EXP2 (1:200) or rabbit anti- MSP8 (1:1,000). After wash- 
ing, the membranes were probed with horseradish peroxidase-conjugated secondary 
antibodies, and detection was performed with SuperSignal enhanced chemilumin- 
escence (Thermo Fischer Scientific). Quantification of signal strength was performed 
with NIH Image] version 1.47d. Western blots were performed on samples made 
from two independent experiments and were analysed twice. 

P. falciparum parasites were treated with various concentrations of glucosamine 

when about halfway through their cell cycle. At mid ring stage in the next cycle(~12h 
after invasion) the parasites were harvested and were freeze-thawed once to break 
open the erythrocyte compartment and release the haemoglobin, which was then 
washed out with PBS. Equal amounts of parasite proteins were fractionated in 4-12% 
acrylamide Bis-Tris gels (Novex Life Technologies) and blotted onto nitrocellulose 
membrane. The membranes were blocked in 1% casein in PBS and were probed with 
the following primary antibodies: chicken anti-HA epitope (1:1,000; Abcam), rabbit 
anti-ERC (1:500), rabbit anti- HSP70-1 (1:500), rabbit anti-GAPDH (1:2,000), rabbit 
anti-HSP101 (1:500) and a mouse anti-RESA monoclonal antibody (mAb 1812; 
1 pg ml’). After washing, the membranes were probed with fluorescent second- 
ary antibodies and detected with a Li-Cor Odyssey FC scanner. Band densities were 
measured using the scanner’s software. Four separate expression assays were per- 
formed and showed similar trends. The densitometry data presented here are from 
one assay analysed twice by western blotting. 
Growth assays. P. berghei schizonts obtained from overnight parasite cultures grown 
in 150 ml of complete RPMI were burst by mechanical action, using a fine needle 
syringe. Viable merozoites, purified by filtration through a 0.2 1m filter, were resus- 
pended in fresh medium and combined 1:1 with fresh erythrocytes. Invasion was 
allowed to proceed for 30 min at 37 °C with vigorous shaking. Cells were either 
maintained in culture in vitro or intravenously injected into naive mice to establish 
synchronous mouse infections. 

For P. falciparum assays, glucosamine was added to a final concentration of 0.075, 
0.15, 0.3, 0.6, 1.25 or 2.5 mM, with O mM glucosamine serving as the negative control. 
Thin blood smears of the cultures were air-dried before being stained with Giemsa. 
A minimum of 2,000 erythrocytes were counted to determine the parasitaemia of 
the culture. 

Immunofluorescence analysis of P. berghei. Erythrocytes infected with P. berghei 
parasites were fixed for 30 min with 4% paraformaldehyde and 0.0075-0.015% glu- 
taraldehyde in PBS. After washing, cells were permeabilized for 10 min with either 
0.25% Triton X-100 or 0.5% Triton X-100 where indicated, and then washed a fur- 
ther three times, before blocking for 30 min with 1% BSA/PBS (Sigma). Cells were 
labelled for 1 h at 20-22 °C or 16 hat 4 °C with anti-EXP2 (1:300), anti- ACP (1:300), 
anti-PbANKA_ 114540 (1:150), anti-PbANKA_ 122900 (1:150), anti-PbDANKA_083680 
(1:150) or anti-MSP8 (1:1,000) (all raised in rabbits), washed and then sequentially 
labelled for 1 h with goat anti-rabbit AlexaFluor 488/568 secondary antibody (1:2,000; 
Life Technologies). Cells were mounted in Vectashield containing the nuclear stain 
DAPI (VectorLabs). For each experiment, at least 100 parasitized erythrocytes were 
counted on two occasions by different researchers; where quantification was per- 
formed, samples were blinded. For co-labelling experiments, anti-PbANKA_114540 
(1:500) and goat anti-rabbit secondary AlexaFluor 488 antibody (1:300-500) were 
added sequentially for 1 h, followed by either anti-EXP2 (1:300), anti- ACP (1:300) 
or no antibody for 16 h, with a final incubation for 1 h with goat anti-rabbit AlexaFluor 
568 secondary antibody (1:2,000). For localization of PhANKA_114540 and anti- 
MSP8, anti-MSP8 (1:1,000) and goat anti-rabbit secondary AlexaFluor 568 anti- 
body (1:300) were added sequentially for 1h, followed by anti-PbANKA_114540 
or no antibody for 16 h, with a final incubation for 1 h with goat anti-rabbit AlexaFluor 
488 secondary antibody (1:2,000). Images were acquired independently by at least 
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two researchers with an Olympus IX70 microscope and processed using NIH 
Image] version 1.47d or Adobe CS6 Photoshop. 

P. falciparum IFA, image scoring and statistics. For P. falciparum IFA, blood 
smears of parasites were thawed, then fixed in ice-cold 100% methanol for 5 min 
before being air-dried. Three time courses were used: a short one for RESA in which 
parasites were synchronous within a 10-min window, a mid-range course for SBP, 
Hyp8, KAHRP and MSP$8 in which parasites were within a 3 h window, and a long 
course for PfEMP1 where the window was 5h. The parasites were rehydrated and 
blocked for 1h in 3% BSA (Sigma) in PBS. The cells were then probed with anti- 
EXP2 (3 pg ml '), anti-SBP (1:200) and anti-RESA (1 Lug ml!) mouse monoclo- 
nal antibodies and rabbit serum for SBP (1:200), KAHRP (1:1,000), Hyp8 (1:200), 
PfEMP1 ATS (1:500) or MSP8 (1:1,000) diluted in 3% BSA in PBS. After being 
washed three times in PBS, the cells were probed with goat anti-mouse AlexaFluor 
568 (1:2,000; Invitrogen) and goat anti-rabbit AlexaFluor 488 (1:2,000; Invitrogen) 
in 3% BSA in PBS for 1h. After being washed three times in PBS, the cells were 
mounted in Vectashield with DAPI. To avoid potential bias, the parasites were 
selected for imaging solely on the basis of their DAPI (nuclear) staining and then 
imaged in the green fluorescent protein, Texas Red, ultraviolet and DIC channels 
ona Zeiss Axio Observer microscope. For each time point the same exposure times 
were used, to produce consistent fluorescence intensities. After imaging 23 Z 
sections at 0.28 um apart for each cell, a Z-projection was made and used to score 
the degree of protein export. 

Before the images were scored in FIJI v.1.48, their file names were de-identified 

of parasite line information. To score the mean level of MSP8 expression, in the 
whole infected erythrocytes the cell area was selected and the mean fluorescence 
intensity was obtained by means of the ‘Measure’ function. To score the degree of 
KAHRP and RESA export, the circumference of the infected erythrocyte was first 
traced around, followed by the parasite (denoted by EXP2 and DAPI staining) to 
exclude it from subsequent export analysis. The mean fluorescence intensity of 
KAHRP and RESA was then quantified as above. The labelling of SBP and Hyp8 in 
the infected erythrocyte cytosol were similarly traced and the punctate Maurer’s 
clefts were counted, using the ‘Find Maxima’ function set to a noise tolerance of 200 
and ‘Point Selection’ output type. The mean fluorescence intensities for KAHRP- 
labelled cells and the number of Maurer’s clefts for SBP and Hyp8 were graphed in 
GraphPad Prism. Unpaired t-tests using parametric distribution were performed to 
measure differences between glucosamine-treated and untreated parasites and were 
assumed to be significant when P < 0.05. A Mann-Whitney test was used for the 
RESA data. 
Flow cytometry. For analysis of P. berghei surface antigens using fluorescence- 
activated cell sorting (FACS), 20 pl of blood collected from the tail vein of P. berghei- 
infected mice was washed briefly in RPMI and then blocked for 1 h in 1% casein in 
RPMI. Erythrocytes were then incubated for 1 h with serum harvested from either 
P. berghei semi-immune or non-immune (pre-bleed) mice, generated as described”? 
and diluted 1:20 in blocking solution. After three washes with block solution, cells 
were incubated for 1 h with goat anti-mouse IgG AlexaFlour 647 (1:2000; Invitrogen), 
washed a further three times and then incubated for 5 min in Sybr safe (Invitrogen) 
diluted 1:2000 in blocking solution. A further three washing steps were performed, 
after which the cell preparation was analysed with a FACS Canto II machine (BD 
Biosciences). All incubations were performed at 20-22 °C. 

P. falciparum cultures were synchronized with sorbitol and then CS2 PTEX150- 
HA and CS2 PTEX150-HAglmS parasites were cultured in the presence of gluco- 
samine at indicated concentrations at 24-30h after invasion. Heparin sulphate 
(100 pg ml’; Sigma) was also added to prevent new invasion events™. Parasites 
incubated with no glucosamine served as controls. Heparin was removed to permit 
invasion; 4 h later, parasites were synchronized with 5% sorbitol to remove tropho- 
zoites and unruptured schizonts, resulting in an invasion window of 4h (ref. 32). 
Samples were taken at 24-28, 32-36 and 38-42h after invasion, and the surface 
expression of the PfEMP1 variant VAR2CSA was measured”. 

To test for surface VAR2CSA expression on these parasites, serum samples were 
collected from multigravid women that attended antenatal care at Alexishafen Health 
Centre, Madang Province, Papua New Guinea, which is endemic for P. falciparum 
malaria**”*. Serum samples were screened by ELISA for reactivity against VAR2CSA- 
DBL5 recombinant protein*””*, and a pool of five high responder samples was made; 
reactivity of serum IgG in this pool to the surface of intact CS2-infected erythrocytes 
was confirmed using flow cytometry. A pool of serum from malaria-naive indivi- 
duals from Melbourne, Australia, was used as a negative control. Serum IgG binding 
to the surface of infected erythrocytes was measured by flow cytometry as described 
previously”. In brief, samples taken at various time points were washed in PBS 
with 0.1% casein, resuspended at 0.2% haematocrit, and incubated sequentially with 
test serum diluted 1:40 in 0.1% casein in PBS, then with polyclonal rabbit anti-human 
IgG (1:100) in 0.1% casein, and lastly with 10 jg ml’ goat anti-rabbit AlexaFluor 488 
and 10 jig ml’ ethidium bromide in 0.1% casein. All incubations were for 30 min 
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at 20-22 °C, with three washes in 0.1% casein between incubations. Cells were 
resuspended in PBS, and data were acquired with a FACS Canto II machine and 
analysed with FlowLogic software (eBioscience). After gating for erythrocytes, serum 
IgG binding for each sample was expressed as the geometric mean fluorescence 
intensity (MFI) for trophozoites after subtracting the MFI of uninfected erythro- 
cytes. Samples were considered as positive for IgG binding when the MFI was more 
than three s.d. above the mean binding seen with malaria-naive control sera. Ethics 
approval for human studies was provided by the Alfred Hospital Human Research 
and Ethics Committee, and the Medical Research Advisory Committee of Papua 
New Guinea. All participants gave written informed consent. 

Cleavage of surface proteins with trypsin. P. falciparum parasites were cultured 
in RPMI-HEPES supplemented with 10% human serum. Glucosamine was added 
at 0 and 0.3 mM 24 hafter invasion CS2 PTEX150-HA and CS2 PTEX150-HAglmS 
parasites. Midway through the following cell cycle, the parasites were purified by 
magnetic separation and were treated with RPMI-HEPES with 5% sucrose, with or 
without 1 mg ml” ‘trypsin, for 1 h at 37 °C (ref. 41). Soybean trypsin inhibitor was 
added to stop the reaction, and the parasites were solubilized in 1% Triton X-100 
on ice for 20 min and centrifuged at 18,000g for 5 min. The pellet fraction containing 
PfEMP1 was solubilized by sonication in 2% SDS at 20-22 °C and equal amounts 
were separated by electrophoresis on 3-8% Tris-acetate polyacrylamide (Novex Life 
Technologies) and transferred to a nitrocellulose membrane. The membrane was 
probed with monoclonal anti-PfEMP1 acidic terminal segment (ATS) (1B/98 8AB- 
19-18 at 6 jig ml_*) and goat anti-mouse antibody IRDye 800 Conjugated (Rockland) 
at 1:5,000 and imaged as above. 

Cytoadherence assays. Adhesion of CS2 PTEX150-HAglms parasites and control 
PTEX150-HA to the receptor CSA was tested after treatment with 0.15 mM gluco- 
samine (+) in the previous cycle. After a 30 min incubation of gelatin-enriched 
parasites with immobilized CSA (from bovine trachea, 20 mg ml — 4, unbound cells 
were washed and bound parasites were fixed with 2% glutaraldehyde in PBS, stained 
with Giemsa and counted by microscopy”. The number of cells adhered per square 
millimetre for each condition (tested in triplicate) was normalized to its (—) glu- 
cosamine control. The graph shows results from two independent experiments. Error 
bars represent s.d.; **P < 0.005 (Mann-Whitney test). 

Statistics. All graphs and data generated in this study were analysed using GraphPad 
Prism 6.0b Software (MacKiev). Unpaired t-tests using parametric distribution were 
performed to measure differences between untreated and treated (ATc or glucosa- 
mine) parasites. P< 0.05 was considered significant. 
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Extended Data Figure 1 | Disruption of P. berghei TRX2 leads to reduced 
protein export. a, IFA of fixed infected erythrocytes using P. berghei 
semi-immune sera reveals TRX2 knockout parasites (TRX2 KO) show reduced 
surface labelling compared with wild-type P. berghei ANKA parasites (WT), 
indicative of a reduction in expression of parasite antigens on the surface of 
erythrocytes infected with the TRX2 KO. Pre-bleed sera were used as a negative 
control. b, Quantitative FACS analysis of erythrocytes harvested from 
asynchronously infected mice (n = 6) show that two independent clonal 
populations of TRX2 KO parasites exhibit significantly reduced levels of surface 
labelling with P. berghei semi-immune sera compared with wild-type parasites 
(*P < 0.05; **P < 0.01; ***P < 0.001, unpaired t-test). c, As b, except that 


synchronous mouse infections were initiated by injecting purified merozoites 
into the tail veins of mice, and surface labelling of infected erythrocytes with 
semi-immune sera was performed at time points relative to when the wild-type 
line reinvaded erythrocytes for the second cycle (left); p.i., post invasion. Even 
taking into consideration that disruption of TRX2 leads to slower growth by 
about 6h, the surface labelling of TRX2 KO parasites at a stage of growth 
comparable to that of wild-type parasites is also significantly reduced (right) 
(n = 3 independent experiments). d, Giemsa smears showing the stages of 
parasite development at time points relative to when wild-type parasites had 
invaded erythrocytes. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 
End | E 5'UTR 3'UTR E 
ndogenous locus 
En: -—">( HSP101 JE a 
Targeting 
construct 
c E -Ate —_ high mRNA 
Locus after venir 
mem > TRAD }—=H{ DHFR } CII ASPi0}=—y— 
integration : , 
= s cH ww € 
probe +ATc low mRNA 
b 
12 
& 
£9 
L£ 
7) 
© 6 
o 
3 \ i 
0 
10123 45 67 
Day post infection 
12 
& 
£9 
£ 
” 
© 6 
o 
o 
xs 3 +ATc 
0 
1012345 67 


Day post infection 


Extended Data Figure 2 | Generation of a HSP101 knockdown line in 

P. berghei. a, Schematic representation used to construct Pbil01 KD parasites. 
PCR primers used to detect 5’ integration (a/b), 3’ integration (c/d) and 
wild-type locus (a/d) are indicated. E, EcoRI. b, Representative experiments 
(n = 3) showing parasitaemias in mice that were (upper panel) or were not 


(lower panel) pre-exposed to ATc in their drinking water before infection. 
At day 4 after infection, the treatment regimens in both experiments were 
switched. Error bars show s.e.m. for three mice per condition performed in 
parallel. 
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Extended Data Figure 3 | Knockdown of HSP101 blocks protein export. 

a, Representative IFA of intraerythrocytic stages showing that export of three 
different P. berghei proteins across the parasitophorous vacuole membrane is 
blocked when Pbil01 KD, but not wild-type, is exposed to ATc. Samples were 
harvested at the times indicated by the asterisks in Fig. 1d and e. b, IFAs 
show that correct localization of EXP2 and ACP is unaffected in Pbil01 KD 
parasites treated with ATc (right panels). In these samples, cells were 


i101 KD + ATc 


Merge + 


DAPI/DIC 


DAPI/DIC Merge 


Merge + 
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| i a 


permeabilized after fixation with 0.5% Triton X-100. Because the 
PbANKA_ 114540, EXP2 and ACP antibodies were all raised in rabbits, 
sequential labelling with anti-PbANKA_114540, anti-rabbit AlexaFluor488, 
anti-EXP2 or anti-ACP, and anti-rabbit AlexaFluor568 had to be performed. 
Control IFAs were therefore performed in which anti-EXP2 or anti-ACP 
were omitted (left panels). 
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Extended Data Figure 4 | Diagnostic PCR analysis shows the ptex150 gene 
has been appended with a HA tag in the PTEX150-HA parasites and a 
HAglmS tag in the PTEX150-HAgImS parasites. a, Diagram of the 
targeted genetic crossovers and binding sites of the PCR primers. b, Using the 
indicated primer combinations, correct 3’- recombination has occurred in the 


PTEX150-HA and PTEX-HAglmS parasites using primers A/C, with a band 
specific to the integrated locus (1.7 kb) only observed in HA-tagged parasite 
lines. c, Diagram showing how the gimS ribozyme after glucosamine binding is 
stimulated to cleave its mRNA, resulting in message destabilization and a 
decrease in protein levels. 
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Extended Data Figure 5 | Growth assays of PTEX150-HA and PTEX150- 
HAglmS parasites show that growth of the latter declines substantially 
after treatment with glucosamine (GlcN). CS2 PTEX150-HA and CS2 
PTEX150-HA-glmS parasites were treated with different concentrations of 
glucosamine from 24-30 h after invasion (hpi) and then allowed to invade fresh 
erythrocytes for 4h. At the times after invasion indicated, the cells were stained 
with ethidium bromide to measure DNA content as a marker for parasite 


growth. Representative histograms show the levels of ethidium bromide 
intensity (x axis) and cell number (y axis). Infected and uninfected erythrocytes 
(uE) are shown as black and grey, respectively. Those parasites to the right of 
the red line are the strongly staining trophozoites; those to the left are the 
younger, weakly staining ring stages. Assays were performed at least three 
times independently. 
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Extended Data Figure 6 | PTEX150-HAgImS protein levels are markedly 
reduced on induction of the glmS ribozyme with GlcN. a, Western blots of 
PTEX150-HAglmS and control PTEX150-HA mid-ring-stage parasites 

(~12 hpi) probed with the antibodies indicated on the right. GlcN was added at 
the concentration indicated above the blots, halfway through the previous cell 
cycle. b, Western blots were performed in duplicate and densitometry of the 
bands has been graphed showing the mean + s.d. relative to no GIcN. Top: 
PTEX150 levels in the PTEX150-HAglmS (150-glmS) decrease with increasing 
concentrations of GlcN to a minimum ~17% of the level without GlcN. The 
levels of PTEX150 in the control PTEX150-HA (150-HA) parasites does not 
decrease in GlcN. Middle and bottom: the levels of co-regulated HSP101 and 
RESA proteins and cytoplasmic constitutive HSP70-1, GAPDH and ERC 


proteins also decline in the PTEX150-HAgImS parasites after treatment with 
GIcN to about 50-60%, indicative of slowed growth due to loss of PTEX150 
function. c, Western blot of infected erythrocytes treated with trypsin to cleave 
off surface-exposed Pf{EMP1. The blot has been probed with a monoclonal 
antibody against the intracellular C-terminal tail of PFEMP1, and the 
densitometry of the 350 kDa VAR2CSA band (arrow) has been compared 
between trypsin-treated and untreated infected erythrocytes to calculate the 
percentage cleaved in the presence or absence of 0.3 mM GIcN. d, IFAs of 
PTEX150-HAglmS probed for PfEMP1 and EXP2 after treatment with GlcN 
indicate a decrease in the export of PfEMP1-containing structures to the 
periphery of the infected erythrocyte. Scale bar, 5 jum. 
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Extended Data Figure 7 | Export of KAHRP in PTEX150-HAgImS (glms) is 
decreased after treatment with GlcN. The mean fluorescence intensity (MFI) 
of the erythrocyte compartment in infected erythrocytes stained with rabbit 
anti-KAHRP always declines after the addition of GlcN halfway through the 
previous cell cycle. In comparison, treatment with GlcN does not consistently 
decrease KAHRP export in the control PTEX150-HA (HA) parasites; the 
variation is possibly due to inconsistencies in sample preparation. In the graphs, 
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the boxes and whiskers delineate the 25-75th and 10-90th centiles, 
respectively. Outlying data points are shown as dots. Significances: *P < 0.05; 
**P < 0.01; ***P < 0.001 by unpaired t-test. The number of cells (1) counted is 
indicated below the graph. Example immunofluorescence images of only 
PTEX150-HAglmS are shown. The regions occupied by the parasite are 
indicated by staining with DAPI and staining for EXP2. Scale bar, 5 um. 
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Extended Data Figure 8 | Export of SBP1 in PTEX150-HAgImS (glms) is 
decreased after treatment with GlcN. The number of punctate Maurer’s clefts 
(MCs) present in the erythrocyte compartment in infected erythrocytes stained 
with rabbit anti-SBP1 nearly always declines after the addition of GlcN 
halfway through the previous cell cycle. In comparison, treatment with 

GIcN does not consistently decrease SBP1 export in the control PTEX150-HA 
(HA) parasites; the variation is possibly due to inconsistencies in sample 
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preparation. In the graphs, the boxes and whiskers delineate the 25-75th 


and 10-90th centiles, respectively. Outlying data points are shown as dots. 
Significances: *P < 0.05; **P < 0.01; ***P < 0.001 by unpaired t-test. The 


number of cells (n) counted is indicated below the graph. Example 
immunofluorescence images of only PTEX150-HAglmS are shown. The 


staining for EXP2. Scale bar, 5 jum. 
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Extended Data Figure 9 | Export of Hyp8 in PTEX150-HAgImS (glms) is 
reduced following glucosamine treatment. The number of punctate Maurer’s 
Clefts (MCs) present in the erythrocyte compartment in infected erythrocytes 
stained with rabbit anti- Hyp8 nearly always declines following addition of 
GlcN half way through the previous cell cycle. In comparison, GlcN treatment 
does not consistently reduce Hyp8 export in the control PTEX150-HA (HA) 
parasites and the variation is possibly due to inconsistencies in sample 
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preparation. In the graphs, the boxes and whiskers border the 25-75th and 
10-90th percentiles, respectively. Outlying data points are shown as dots. 
Significances: *P < 0.05; **P < 0.01; ***P < 0.001 by unpaired t-test. The 
number of cells (n) counted is indicated below the graph. Example 
immunofluorescence images of only PTEX150-HAglmS are shown. The 
regions occupied by the parasite are indicated by staining with DAPI and 
staining for EXP2. Scale bar, 5 jum. 
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PTEX component HSP101 mediates export of diverse 
malaria effectors into host erythrocytes 


Josh R. Beck, Vasant Muralidharan?**+, Anna Oksman!*? & Daniel E. Goldberg’? 


To mediate its survival and virulence, the malaria parasite Plasmodium 
falciparum exports hundreds of proteins into the host erythrocyte’. To 
enter the host cell, exported proteins must cross the parasitophorous 
vacuolar membrane (PVM) within which the parasite resides, but the 
mechanism remains unclear. A putative Plasmodium translocon of 
exported proteins (PTEX) has been suggested to be involved for at 
least one class of exported proteins; however, direct functional evid- 
ence for this has been elusive” *. Here we show that export across the 
PVM requires heat shock protein 101 (HSP101), a ClpB-like AAA+ 
ATPase component of PTEX. Using a chaperone auto-inhibition strat- 
egy, we achieved rapid, reversible ablation of HSP101 function, result- 
ing in a nearly complete block in export with substrates accumulating 
in the vacuole in both asexual and sexual parasites. Surprisingly, this 
block extended to all classes of exported proteins, revealing HSP101- 
dependent translocation across the PVM as a convergent step in the 
multi-pathway export process. Under export-blocked conditions, asso- 
ciation between HSP101 and other components of the PTEX complex 
was lost, indicating that the integrity of the complex is required for 
efficient protein export. Our results demonstrate an essential and uni- 
versal role for HSP101 in protein export and provide strong evidence 
for PTEX function in protein translocation into the host cell. 

PTEX is a protein complex found in the PVM, where a translocon 
responsible for export into the host erythrocyte is expected’. The timing 
of its synthesis is appropriate for a putative translocon and it has a com- 
ponent (EXP2) with weak homology to bacterial haemolysins, which 
could form a membrane-spanning channel. PTEX contains two additional 
core components: the novel protein PTEX150 and HSP101, a member 
of the Clp/HSP100 family of AAA+ ATPases. To directly explore the 
role of HSP101/PTEX in protein export, we used a conditional auto- 
inhibition approach by fusing a dihydrofolate reductase (DHFR)-based 
destabilization domain (DDD) to the endogenous HSP101 carboxy ter- 
minus (Fig. 1a). Although this fusion strategy was originally developed 
to mediate conditional protein degradation via the proteasome”®, we 
have found that the destabilized tag can conditionally interfere with 
chaperone function without protein degradation’. Following transfec- 
tion, two independent clones were isolated which had undergone the 
intended recombination event (Extended Data Fig. 1). 

When the DDD-stabilizing small molecule trimethoprim (TMP) 
was removed from asynchronous HSP101?”” cultures, a complete block 
in growth was observed, with parasites accumulating as late ring-stage 
forms (Fig. 1b, c). Although growth inhibition was TMP-concentration- 
dependent, this effect was not a result of HSP101?”” degradation (Extended 
Data Fig, 2a, b). To determine sensitivity to HSP101”” destabilization 
across the asexual developmental cycle, TMP was removed at regular 
intervals in synchronized HSP101””” parasites. Removal of TMP in early 
ring-stages resulted in growth arrest. In contrast, when TMP was removed 
at or after the beginning of the trophozoite stage (18-24 h post-invasion), 
parasite development and re-invasion proceeded normally, followed by 
arrest at the subsequent ring stage (Fig. 1d and Extended Data Fig. 2c, d). 


Parasite growth was rescued by adding back TMP as late as 48 h after 
withdrawal, indicating that arrested ring forms remained viable. 

As protein export and PTEX expression also occur in parasite sexual 
stages, we further evaluated the ability of HSP101””” parasites to form 
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Figure 1 | HSP101 is essential for development of asexual and sexual blood 
stages. a, Auto-inhibition strategy for HSP101?P?. TMP, trimethoprim; HA, 
haemagglutinin tag; DDD, DHFR destabilization domain. b, Growth analysis 
of asynchronous cultures of the two independent clones 13F10 and 14G11 
with or without TMP. Error bars represent s.d. of three technical replicates. 
Data are representative of three independent experiments. c, Giemsa-stained 
smears of cultures following 48 h with or without TMP. Accumulation of late 
ring-stage parasites is observed in the absence of TMP (arrows). Images are 
representative of three independent experiments. d, Growth analysis of 
synchronous 13F10 parasites. TMP was removed at the early trophozoite stage 
and added back to cultures after 24, 48 or 72 h. Equivalent parasitaemia in all 
samples at 24h shows that development through trophozoite and schizont 
stages, egress and reinvasion were not affected by TMP removal. Error bars as in 
b. Data are representative of three independent experiments. e, Analysis of 
gametocyte formation by 13F10 parasites. TMP was removed from late 
schizonts following gametocyte induction (0h) or at subsequent 24h intervals. 
In one sample, TMP was removed at 0h and restored after 24h (—/+ TMP 
24h). Gametocytaemia of various stages on day nine post-induction is shown. 
Error bars as in b. Data are representative of four independent experiments. 
f, Giemsa-stained smears of gametocyte cultures 9 days post induction. 
Images are representative of four independent experiments. d, f, Original 
magnification < 1,000. 
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gametocytes required for malaria transmission*”. In gametocytogenesis 
induction experiments, gametocyte development was severely inhibited 
when TMP was removed during stage I (Fig. le, f and Extended Data 
Fig. 3). Development was partially rescued by re-introducing TMP after 
24h. Consistent with PTEX degradation in later gametocyte stages’, 
gametocyte formation was largely unaffected by later TMP removal (stage 
II onwards, 48 and 72h), although a slight delay in development was 
observed. Collectively, these results demonstrate that HSP101 serves an 
essential function during asexual and sexual blood-stage development. 

To determine the impact of HSP101?”” inhibition on protein export, 
we examined a panel of proteins representative of the diversity of solu- 
bility states, targeting motifs, host-cell destinations and expression tim- 
ing observed among the P. falciparum repertoire of exported effectors. 
Most exported proteins contain a pentameric Plasmodium export ele- 
ment (PEXEL)’*”’ motif that is cleaved by the aspartic protease plas- 
mepsin V in the parasite endoplasmic reticulum to license these proteins 
for export’*’. We first examined histidine-rich protein II (HRP2)"*, 
a soluble PEXEL protein that is exported into the erythrocyte cytosol 
at the ring stage. To this end, TMP was removed from synchronized 
HSP101?”” schizonts, which were then allowed to develop for 18-24h 
into rings. Remarkably, export of HRP2 into the host cytosol was com- 
pletely blocked in nearly all infected red blood cells (RBCs), with the 
protein accumulating in the parasitophorous vacuole (PV) surrounding 
the parasite (Fig. 2a, b). Similar results were obtained with the soluble, 
PEXEL-containing protein REX3”° and the membrane-associated, PEXEL- 
containing protein KAHRP"* (Extended Data Fig. 4a, b and g). Among 
PEXEL-containing proteins, ring-infected erythrocyte surface antigen 
(RESA) is uniquely stored in secretory organelles of invading merozoites 
and discharged into the PV along with PTEX immediately following 
invasion**'’. Robust block in RESA export was seen in recently invaded 
parasites (=1 h post invasion), indicating that, in the absence of TMP, 
HSP101°?”? function is impaired from the time of invasion (Extended 
Data Fig. 4c, g). 

PEXEL-negative exported proteins (PNEPs) are not cleaved by plas- 
mepsin V"* but seem to contain export signals in their amino termini”. 
As PTEX has thus far been implicated only in translocation of PEXEL 
proteins’ *, we examined the PNEP REX], a protein associated with 
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the cytosolic face of parasite-induced structures in the RBC cytosol 
called Maurer’s clefts, which serve as platforms for sorting of exported 
proteins’. We again observed a block in export with REX1 accumulat- 
ing in a ring around the parasite where it colocalized with HSP101?? 
(Fig. 2c and Extended Data Fig. 4g). A similar block was observed for 
SBP 1” and REX2", integral membrane PNEPs exported to the Maurer’s 
clefts (Extended Data Fig. 4d—g). Export was readily restored when TMP 
was added back to blocked cultures and restoration was largely insensitive 
to cycloheximide, indicating that previously synthesized HSP101°”” and 
exported proteins in the PV are sufficient to reactivate export (Extended 
Data Fig. 5). 

As later stages of parasite growth and development are not sensitive 
to TMP removal (Fig. 1d), we wondered if an export block could be 
induced after the ring stage. To this end, TMP was removed from 
synchronous HSP101””” parasites at the late ring stage and export of 
trophozoite/schizont-specific proteins was assessed 12-24h later. We 
first examined MSRP6, a soluble PNEP peripherally associated with 
Maurer’s clefts”. Parasites lacking TMP displayed a marked block of 
MSRP6 export whereas SBP1, exported before the block, was found in 
the erythrocyte (Fig. 2d and Extended Data Fig. 4g). Similar results were 
seen for the PNEP PfEMP1, a variable surface antigen and key virulence 
determinant for P. falciparum malaria” (Fig. 2e and Extended Data 
Fig. 4g). Importantly, PfEMP1 trafficking to the RBC surface is depen- 
dent on several additional exported proteins’. Thus, although our results 
indicate delivery of PEMP1 to the RBC surface is HSP101-dependent, 
we cannot exclude the possibility that this block is an indirect result ofa 
failure to export other proteins required for PfEMP1 trafficking rather 
than a direct effect of HSP101 inactivation. Finally, we examined PFA660, 
a PEXEL-containing HSP40 found in parasite-induced membrane struc- 
tures in the RBC called J-dots™. Again, washout of TMP efficiently blocked 
export as visualized by expression of a PFA660-green fluorescent protein 
(GFP) fusion protein in live parasites (Fig. 2f and Extended Data Fig. 4g). 
Together, these data show that HSP101 function is necessary for export, 
but not for parasite development in the trophozoite and schizont stages. 

Formation of a normal digestive vacuole containing hemozoin (Fig. 2d- 
f, DIC panels) indicates that haemoglobin uptake and degradation pro- 
ceeded normally in these parasites, a process requiring proteases that 
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Figure 2 | HSP101 is required for export of PEXEL and PNEP proteins. 

a, c, Immunofluorescence assay (IFA) of ring-stage 13F10 parasites with or 
without TMP. TMP was removed in late schizont stage and parasites were 
allowed to reinvade and grow 18-24h before fixation with paraformaldehyde 
(a) or acetone (c). a, IFA of the exported PEXEL-containing protein HRP2. 
SERP is a marker for the PV. Export was scored as complete (no HRP2 signal 
enrichment around the parasite as shown in the +TMP IFA), partial (HRP2 
signal within the host cell but also enriched around the parasite) or no export 
(HRP2 signal only seen around the parasite and not in the host cell, as shown in 
the -TMP IFA). Error bars represent s.d. of three technical replicates. Data are 
representative of five independent experiments. DIC, differential interference 
contrast. b, Sequential fractionation of infected ring-stage parasites with or 
without TMP analysed by western blot. The host cytosol was released with 
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tetanolysin (TTL) and subsequently the PV contents were released with 
saponin (SAP). Blocked HRP2 is found in the PV fraction. Haemoglobin (Hb) 
was detected by Coomassie staining and serves as a control for host cytosol 
release. SERP serves as a control for PV release. BiP serves as a parasite integrity 
control. Data are representative of two independent experiments. c, IFAs of the 
PNEP REX1, which colocalizes with HSP101°” at the PVM in the absence of 
TMP. d,e, IFA of trophozoite-stage 13F10 parasites with or without TMP. TMP 
was removed in late ring stage and parasites were allowed to develop 12-24h 
before fixation with acetone. f, Live fluorescence imaging of 13F10 parasites 
expressing a PFA660-GFP fusion and labelled with Bodipy TR Ceramide to 
demarcate the PVM (other membranes are also labelled). TMP treatment as in 
d, e. All scale bars, 5 um. Images in c-f are representative of two independent 
experiments. 
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are secreted into the PV before entering the digestive vacuole where they 
are activated”*. Indeed, maturation of plasmepsin II was unaffected by 
TMP removal, indicating normal trafficking through the PV to the 
digestive vacuole (Extended Data Fig. 6). Additionally, HSP101 inacti- 
vation did not affect egress, a process mediated by secretion of the PfSUB1 
protease into the PV”®. Because PV processes unrelated to export proceed 
normally when HSP101 function is interrupted, HSP101 seems to serve a 
specific role in export. Furthermore, parasites expressing a PV-targeted 
GFP”? fusion protein showed no defect in growth or export, indicating 
that the HSP101°”” phenotype is not an indirect result of destabilizing 
the DDD in the PV compartment (Extended Data Fig. 7). 

Finally, we examined PfGECO, a PEXEL-containing HSP40 that is 
specifically expressed in gametocytes”. Again, export of PfGECO across 
the PVM was dramatically blocked in stage I gametocytes, showing an 
HSP101 requirement for protein export in sexual stages (Extended Data 
Fig. 4h). 

Parasite nutrient acquisition is supported by establishment of a Plas- 
modium surface anion channel (PSAC or new permeability pathways) 
at the late ring stage. This channel modifies the sensitivity of the infected 
RBC to osmotic lysis upon solute uptake. Following TMP removal before 
RBC invasion, arrested ring-stage HSP101°”” parasites failed to lyse in 
the presence of sorbitol (Fig. 3a). In contrast, sensitivity to osmotic lysis 
developed normally when TMP was removed at the end of the ring stage 
(Fig. 3b). CLAG3 proteins localize to the infected RBC periphery and are 
the only parasite proteins known to be involved in PSAC activation”®. 
Interestingly, we observed no change in CLAG3 localization to the host 
periphery when export was blocked, indicating that additional exported 
factors are required to activate PSAC (Fig. 3c). CLAG3 may be directly 
secreted into the RBC membrane during invasion similar to other rhop- 
try proteins”, or could traffic from the PV by an HSP101-independent 
export pathway. Nutrient uptake via PSAC is an essential process and 
block of its formation at the late ring stage may be the lethal event that 
follows HSP101 inhibition. 
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Figure 3 | HSP101 is required for activation of PSAC but not trafficking of 
CLAG3 to the RBC periphery. a, b, Osmotic lysis assay on 13F10 parasites. 
Sorbitol-sensitivity of arrested, late ring stage (— TMP) and control (+TMP) 
parasites at 25% parasitaemia (a) or later stages with or without TMP magnet- 
purified to >95% parasitaemia (b) is shown. Error bars represent s.d. of 
three technical replicates. Results are representative of two independent 
experiments. c, IFA of ring-stage 13F10 parasites showing CLAG3 localization 
to the RBC membrane with or without TMP. TMP was removed in the late 
schizont stage and parasites were allowed to develop 18h before fixation with 
90% acetone/10% methanol. Scale bar, 5 jim. Images are representative of two 
independent experiments. 
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We reasoned that a block to protein export might arise from the dis- 
ordered DDD interfering with interactions between HSP101°”” and other 
PTEX components. To test this, HSP101°”” was immunoprecipitated 
from parasite lysates. Remarkably, whereas localization of HSP101°?? 
and EXP2 was unchanged with or without TMP (Extended Data Fig. 8a-c), 
their interaction decreased by more than 90% in the absence of TMP 
(Fig. 4a, f). A similar loss of HSP101-EXP2 interaction (70% decrease) 
was observed in reciprocal immunoprecipitation experiments (Fig. 4b, f). 
In contrast, association of HSP101?” with RESA increased by more 
than twofold (Fig. 4c, fand Extended Data Fig. 9a). To analyse PTEX150 
interactions, we generated an HSP 101°? strain expressing a second 
copy of PTEX150 with a Flag tag. The fusion protein localized to the 
PVM and distribution was unchanged with or without TMP (Extended 
Data Fig. 8d, e). Interaction between HSP10 1PPP and PTEX150 was 
substantially decreased in the absence of TMP (84% decrease, Fig. 4d, f) 
whereas interactions between PTEX150 and EXP2 were not affected 
(Fig. 4e, fand Extended Data Fig. 9b). We conclude that destabilization 
of the DHFR-domain fold results in HSP101?”” dissociation from the 
PTEX complex, producing a block in translocation of exported proteins 
out of the PV (Fig. 4g). Increased association with RESA under these 
conditions shows that interaction with exported substrates is not impaired. 
Interestingly, we also observed increased interaction between EXP2 
and RESA in the absence of TMP (Fig. 4c, fand Extended Data Fig. 9a), 
suggesting that PTEX components can interact with substrates inde- 
pendent of HSP101 but that HSP101 is needed to drive export, probably 
through substrate unfolding and/or translocation (although we cannot 
exclude the possibility that association of RESA and HSP101°”” in the 
absence of TMP results from ‘stickiness’ of the destabilized DDD). 
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Figure 4 | Inactivated HSP101””” dissociates from the PTEX complex. 
a-c, Immunoprecipitation (IP) of PTEX components from lysates of 13F10 
parasites. S, lysate supernatant input. E, elution. d, e, IP of PTEX components 
from lysates of 13F10 parasites expressing a PTEX150-Flag fusion protein. 

f, Quantification of IP data. Data are representative of three independent 
experiments. g, Model for mechanism of export block following inactivation of 
HSP101???. HSP101 recognizes exported substrates and drives their 
translocation across the PVM in conjunction with other PTEX components by 
unfolding and/or directional threading. Destabilization of the DDD tag results 
in dissociation of HSP101 from PTEX, blocking translocation. Continued 
interaction of HSP101?°? and EXP2 with exported substrates in these 
conditions suggests substrate molecules may be trapped at various steps in the 
translocation process. 
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These data demonstrate that HSP101 is required to mediate protein 
export in P. falciparum. As HSP101 is a major component of PTEX, 
our results provide strong functional evidence for a PTEX role in protein 
translocation (Fig. 4g), although we cannot formally rule out the pos- 
sibility that these proteins perform some other critical activity upstream 
of actual translocation across the PVM. The essential role of HSP101 
in parasite survival and export of key virulence determinants provides 
impetus for pursuing PTEX as a drug target. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Cell fractionation. TMP was removed from synchronous 13F10 parasites at the 
late schizont stage and development was allowed to proceed with or without 10 1M 
TMP for 18h to the ring stage. Ring-stage parasites were enriched to >95% para- 
sitaemia by an initial tetanolysin treatment of 2.5 haemolytic units (HU) for 20 min 
as described*®. Samples were then fractionated by sequential treatment with teta- 
nolysin (1 HU for 20 min) and saponin (0.035% for 5 min) as described’**®. 
Parasite strains, culture and genetic modification. Parasite culture and flow cyto- 
metry analysis was performed as described’. Cloning was performed using the In- 
Fusion system (Clontech) unless otherwise noted. For generation of HSP1012??? 
parasites, the DDD cassette was PCR-amplified from plasmid pGDB* with primers 
5'-GAGCAGCTGACCTAGGTACCCATACGATGTTCCAGATTACGCTTACC 
CATACGATGTTCCAGATTACGCTTACCCATACGATGTTCCAGATTACGC 
TATCAGTCTGATTGCGGCGTTAGCGGTAGATCACGTTATCGGC-3’ and 5'- 
TAACTCGACGCGGCCGTCATCGCCGCTCCAGAATCTCAAAGCAATAGCT 
GTGAGAG-3’ to incorporate a 5’ 3X HA tag. The amplicon was inserted into plasmid 
pPM2GT* between restriction sites AvrII and Eagl. The BSD cassette was then PCR- 
amplified from pGDB with primers 5’-ATGCCTGCAGGTCGATTGCATGCTT 
AGCTAATTCGCTTG-3’ and 5'-GAGAAATCTAGAGGTACCGAGCTCGATC 
TGCCGGTCTCC-3’ and inserted between restriction sites SalI and BglII. Finally, a 
portion of the 3’ genomic locus of HSP101 up to but not including the stop codon 
was PCR-amplified from P. falciparum genomic DNA using primers 5’-ACGAT 
TITTTCTCGAGCGAAAACTTTTATGGTATTAATATAACAGATAAAGCTT 
TAGTAGCAGCAGC-3’ and 5'-CTGCACCTGGCCTAGGGGTCTTAGATAAG 
TTITATAACCAAGTTTTTAGCTTTACTATTATAATCAACAAATACATCC -3' 
and inserted between restriction sites XhoI and AvrlII. The resulting plasmid pHSP101- 
HDB was transfected into the 3D7-derived parental strain PM1KO”, which contains 
an hDHFR expression cassette conferring resistance to TMP. Selection, cycling and 
cloning were performed as described’. TMP was maintained in the medium through- 
out. Two clones from independent transfections were isolated and designated 13F10 
and 14G11. Integration was detected by PCR using the primers 5’-CCTCCTTCAG 
TAGATATGACCG-3' and 5’-CTAACGCCGCAATCAGACTG-3’ and confirmed 
by Southern blot using a probe to the 3’ end of the HSP101 gene, generated with the 
primers 5'-CGAAAACTTTTATGGTATTAATATAACAG-3’ and 5'-GGTCTTA 
GATAAGTTTATAACCAAG-3’, 

For gene expression, the episomal expression plasmid pyEOE’ containing a yeast 
dihydroorotate dehydrogenase selection cassette was modified for transposase- 
mediated genomic integration by inserting a piggyBac element containing inverted 
terminal repeats amplified from plasmid pXL-BACII-DHFR” with primers 5’-GG 
AATTTCCTTATAAGATCTTAATACGACTCACTATAGGGCGAATTGGG-3' 
and 5’-ATAATGGTTTCTTAGACGTCGATAAAAGTTTTGTTACTTTATAGA 
AGAAATTTTGAG-3’ between the restriction sites BglII and AatII, resulting in the 
plasmid pTyEOE. For expression of PFA660-GFP, the PFA660 coding sequence was 
PCR-amplified from parasite complementary DNA with primers 5'-ACGATTTTT 
TCTCGAGATGGCAACCTTAAGGAAAAGC-3’ and 5'-CTGCACCTGGCCTAG 
GATAACTCTCTTTAAATATCTCTTITATC-3’ and inserted into plasmid pTyEOE 
between restriction sites XhoI and AvrIl, placing the gene under the control of the 
HSP86 promoter and generating a C-terminal GFP fusion. For expression of the 
PV-targeted GFP-DDD fusion protein, sequence encoding GFP fused to DDD with 
a C-terminal HA tag was amplified from plasmid pGDB with the forward primer 
5'-ACCCCGGGATCTCGAGATGACAAGAAGATATTTAAAGTATTATATTT 
TIGTTACTTTATTGTTTTTTGTTCAAGTTATTAATAATGTATTGTGTGCT 
CCTAGGGCAGCAAGTAAAGGAGAAGAACTTTTCACTGGAG-3’ (encoding 
the HSP101 signal peptide, amino acids 1-27) and the reverse primer 5’-TAACTC 
GACGCGGCCGTCAAGCGTAATCTGGAACATCGTATGGG-3’, This ampli- 
con was inserted in plasmid pT yEOE between the restriction sites XhoI and Eagl. 
For expression of the PTEX150-Flag fusion protein, sequence encoding a 6X Flag 
epitope was generated with the primer 5’-CTTGATAGACCTAGGGACTACAAGG 
ACGACGACGACAAGGATTATAAAGATGATGATGATAAAGATTATAAAG 
ATGATGATGATAAAGATTATAAAGATGATGATGATAAAGATTATAAAG 
ATGATGATGATAAAGATTATAAAGATGATGATGATAAATAACGGCCGC 
GTCGAGTT-3’ and its reverse complement. The primers were annealed and inserted 
between restriction sites AvrII/Eagl in plasmid pTyEOE, resulting in plasmid pTyEOE- 
6xFlag. The PTEX150 gene (which contains no introns) was then PCR-amplified 
from parasite genomic DNA with its endogenous 5’ UTR up to but not including the 
stop codon using primers 5'-GATCGAGACGTCCTCTTTGTGGTCAAAATAAG 
TAAAATTTTATAAATTC-3’ and 5'-GTCACCTAGGATTGTCGTCCTCTTCTT 
CGTCC-3’ and inserted between restriction sites AatII/AvrII by standard ligation. 
pTyEOE-derived plasmids were co-transfected with plasmid pHTH for transient 
expression of the piggyBac transposase to mediate genomic integration”. 
Growth assays. For asynchronous growth assays, parasites were subcultured 1:2 
each day to avoid confounding effects from high parasite density in making growth 
rate comparisons between different samples. Parasitaemia at each time point was 


back calculated based on the subculturing schedule. Data were fit to exponential 
growth equations using Prism (GraphPad Software, Inc.). In synchronous growth 
assays, daily media changes were performed without subculture. Parasitaemia was 
determined by flow cytometry. 

Gametocyte induction. Synchronous 13F10 parasites at 5-7% parasitaemia were 
stressed by increasing haematocrit to 4% at the mid-trophozoite stage for ~12h. 
Haematocrit was returned to 2% at the schizont stage and TMP washout was per- 
formed at late schizont stage shortly before rupture and at 24h intervals thereafter. 
Asexual parasites were killed by treatment with 50 mM N-acetylglucosamine in 
the following cycle. 

Antibodies. The following antibodies were used for IFA and western blot (WB): 
rabbit polyclonal anti-HA (Life Technologies) (IFA: 1:100); rabbit polyclonal anti- 
HA (Sigma) (WB 1:1,000); rat anti-HA mAb clone 3F10 (Roche) (IFA: 1:100, WB: 
1:1,000); mouse anti-Flag mAb clone M2 (Sigma) (IFA: 1:100, WB 1:500); mouse 
anti-EXP2 mAb clone 7.7* (IFA: 1:1,000, WB: 1:1,000); mouse anti- HRP2 mAb 2G12"* 
(IFA 1:1,000, WB: 1:500); mouse anti-RESA mAb clone 28/2*4 (IFA: 1:1,000, WB: 
1:1,000); rabbit polyclonal anti-BiP** (WB: 1:1,000); rabbit polyclonal anti-REX1*° 
(IFA: 1:1,000); mouse polyclonal anti-REX2'° (IFA: 1:500); mouse polyclonal anti- 
REX3* (IFA: 1:500); rabbit polyclonal anti-SBP1”' (IFA: 1:500); mouse polyclonal 
anti-MSRP6” (IFA: 1:500); mouse polyclonal anti-Pf{GECO” (IFA: 1:300); rabbit 
polyclonal anti-Pfs16*’ (IFA: 1:500); rabbit polyclonal anti-PfEMP1 ATS** (IFA: 
1:200); mouse polyclonal anti-CLAG3** (IFA: 1:300); rabbit polyclonal anti-PM2 
antibody 737° (WB 1:2,000); rabbit polyclonal anti-KAHRP"® (IFA: 1:500); rabbit 
polyclonal anti-SERP“° (IFA: 1:500, WB: 1:1,000). 

Light microscopy and image processing. For indirect IFA detecting HRP2, REX3, 
PfGECO and Pfs16, cells were allowed to settle on coverslips coated with conca- 
navalin A and fixed with a mixture of 4% paraformaldehyde and 0.0075% glutar- 
aldehyde in PBS. For indirect IFA detecting HA and Flag epitopes, EXP2, RESA, 
REX1, REX2, SBP1, MSRP6, PfEMP1, and KAHRP, thin smears were air dried and 
fixed with room temperature acetone. For indirect IFA detecting CLAG3 and SBP1, 
thin smears were air dried and fixed with ice-cold 90% acetone/10% methanol. Pri- 
mary antibodies were detected by Alexa Fluor 488 or 594 secondary IgG antibodies 
(Life Technologies) used at 1:2,000. Image stacks were collected at z-increments of 
0.2 um with an ORCA-ER CCD camera (Hamamatsu) and AxioVision software on 
an Axio Imager.M1 microscope (Zeiss) using a X 100 oil immersion objective. De- 
convolved images were generated using manufacturer specified point-spread func- 
tions and displayed as maximum intensity projections. Adjustments to brightness 
and contrast were made for display purposes. Live imaging was performed as prev- 
iously described“. All IFA and live imaging experiments were repeated at least twice. 
Immunoelectron microscopy. For immunolocalization at the ultrastructural level, 
infected RBCs were fixed in 4% paraformaldehyde/0.05% glutaraldehyde (Polysciences 
Inc.) in 100 mM PIPES/0.5 mM MgCl, pH 7.2 for 1h at 4 °C. Samples were then em- 
bedded in 10% gelatin and infiltrated overnight with 2.3 M sucrose/20% polyvinyl 
pyrrolidone in PIPES/MgCl, at 4 °C. Samples were trimmed, frozen in liquid nitrogen, 
and sectioned with a Leica Ultracut UCT cryo-ultramicrotome (Leica Microsystems 
Inc.). 50-nm sections were blocked with 5% fetal bovine serum/5% normal goat serum 
for 30 min and subsequently incubated with primary antibodies followed by sec- 
ondary anti-rabbit conjugated to 18 nm colloidal gold (Jackson ImmunoResearch 
Laboratories, Inc.). Sections were washed in PIPES buffer followed by a water rinse, 
and stained with 0.3% uranyl acetate/2% methyl cellulose. Samples were viewed 
with a JEOL 1200EX transmission electron microscope (JEOL USA Inc.) equipped 
with an AMT 8 megapixel digital camera (Advanced Microscopy Techniques). All 
labelling experiments were conducted in parallel with controls omitting the prim- 
ary antibody. These controls were consistently negative at the concentration of 
colloidal gold conjugated secondary antibodies used in these studies. 

Assessment of export in recently invaded parasites. To monitor RESA export imme- 
diately after invasion, synchronous late schizonts were purified over Percoll and 
then mixed with fresh RBCs in pre-warmed media. Cultures were shaken at 37 °C 
for one hour. After lysis of unruptured schizonts by sorbitol treatment, samples 
were fixed for IFA and stained with RESA antibodies as described above. 
Osmotic lysis assays. Osmotic lysis assays were performed as described by incub- 
ating cells in 290 mM D-sorbitol and measuring hemoglobin release by absorbance 
at 405 nm**’. Samples were prepared as follows: TMP was removed or not from 
synchronous, late schizont-stage 13F10 parasites and development was allowed to 
proceed for 24h, resulting in 25% parasitaemia cultures of early trophozoites 
(+TMP) or arrested late ring-forms (-TMP). Alternatively, TMP was removed 
or not from synchronous 13F10 parasites at the late ring-stage and development 
was allowed to proceed 24h before magnet purification to >95% parasitaemia. In 
either case, the final concentration of cells in each sample was 1x10°/ml. At the end 
of the assay, samples were lysed with saponin and data were normalized to these 
values (100% lysis). 

Immunoprecipitation. For immunoprecipitation, TMP was removed or not from 
synchronous parasites at the early trophozoite-stage and parasites were allowed to 
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develop for 24h to late schizonts and early rings. Parasites were then harvested 
with 0.035% saponin in PBS and lysed in 0.5% triton X-100 in PBS. Following soni- 
cation, lysates were cleared by centrifugation at 16,000g for 5 min and supernatants 
were nutated for 3h at 4 °C with indicated antibodies and Dynabeads coupled to 
protein G or protein A (or Dynabeads alone as a negative control when necessary) 
(Life Technologies). Immune complexes were purified on magnets with extensive 
washing with 0.5% triton X-100 in PBS and then lysed in sample buffer. 

Western blots. Western blot analysis was performed using an Odyssey infrared 
imaging system (LI-COR Biosciences). Primary antibodies were detected by IRDye 
680 and 800-conjugated secondary antibodies (LI-COR Biosciences) used at 1:15,000. 
Quantitative measurements were made with Image Studio software (LI-COR Biosciences). 
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Extended Data Figure 1 | Generation of HSP101?”” strains. a, Schematic of 
strategy to generate HSP101°”” parasites. 3’F, 3’ flank for homologous 
recombination; BSD, blasticidin S deaminase; TMP, trimethoprim; HA, 
haemagglutinin tag; DDD, DHFR destabilization domain. b, Diagnostic PCR 
showing integration of the DDD fusion in the two independent clones 13F10 
and 14G11. Primers shown as black arrows in a. Image is representative of two 
independent experiments. c, Southern blot showing integration of plasmid 
pHSP101-HDB occurred at the intended genomic locus. Expected Ncol 


HA-DDD-— | BSD. —/A+- 


Nco1 


digestion products and sizes are indicated in blue, red and orange in a. The 7.3- 
kb band in 13F10 and 14G11 indicates the presence of concatemers commonly 
observed in P. falciparum. Image is representative of one experiment. 

d, Western blot with anti-HA antibodies detects a 124 kDa band in clones 
13F10 and 14G11. Image is representative of two independent experiments. 
e, IFA of acetone-fixed 13F10 parasites showing colocalization of HSP101PPP 
and EXP2 at the PVM. Scale bar, 5 um. Images are representative of two 
independent experiments. 
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Extended Data Figure 2 | Characterization of HSP101” strains. a, TMP _ was removed or not during the proceeding trophozoite stage. No significant 
dose-response for 13F10 and 14G11 parasites grown for 96 h was measured. difference in the resulting new ring parasitaemia was observed with or without 
Error bars represent s.d. of three technical replicates. b, Western blot on lysates © TMP, indicating no difference in re-invasion efficiency-TMP. Error bars 
from asynchronous 13F10 parasites grown with or without TMP for 24h. BiP __ represent s.d. of three technical replicates. Data are representative of two 
serves as a loading control. Blotting with anti-HA antibody shows no decrease independent experiments. d, Giemsa-stained smears of synchronous 13F10 
in HSP101””” protein levels relative to the BiP loading control. Images are parasites grown with or without TMP from Fig. 1d. Reintroduction of TMP 
representative of two independent experiments. c, Quantification of new ring _ after 24 or 48h restored progression through the intraerythrocytic cycle. After 
parasitaemia with or without TMP. In a parallel experiment to that shown in 72 h without TMP, most parasites appeared as dead, pyknotic forms. Images are 
Fig. 1d (but beginning with lower parasitaemia cultures), new rings were representative of three independent experiments. Original magnification 
counted in Giemsa-stained smears of synchronous 13F10 parasites whereTMP =X 1,000. 
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Extended Data Figure 3 | HSP101 function is critical for early stage gametocytogenesis induction. Data are representative of 4 independent 
gametocyte development. Giemsa-stained smears from day nine post experiments. While control (+TMP) gametocytaemia varied between 
gametocyte induction as quantified in Fig. le. TMP was removed from parallel _ experiments, a similar effect on gametocyte formation was observed in each 
samples at 24-h intervals beginning just before reinvasion (0h) following experiment following TMP removal. Original magnification X 1,000. 
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Extended Data Figure 4 | Localization and quantification of exported 
proteins following HSP101?”” inactivation. a—f, IFA and immunoelectron 
microscopy of ring-stage 13F10 parasites with or without TMP. TMP was 
removed in late schizont stage and parasites were allowed to reinvade and grow 
18-24h before fixation with paraformaldehyde (a) or acetone (b-d, f). 
Immunoelectron microscopy fixation (e) is detailed in methods. A similar 
export block with accumulation of the exported protein at the parasite 
periphery was observed in each case when TMP was removed. The soluble 
PEXEL-containing protein REX3 (a) is normally exported into the host RBC 
cytosol. The PEXEL-containing KAHRP protein (b) is normally exported 
through Maurer’s clefts to knob structures at the cytoplasmic face of the 
infected RBC membrane. The PEXEL-containing protein RESA (c) is normally 
exported to the RBC periphery. SERP is a marker for the PV. SBP1 (d) is an 
integral membrane PNEP normally exported to the Maurer’s clefts. In the 
absence of TMP, blocked SBP1 colocalizes with EXP2. e, Immunoelectron 
microscopy showing localization of SBP1 in ring-stage parasites with or 
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without TMP. MC, Maurer’s cleft. Scale bars, 500 nm. Images are representative 
of one experiment. REX2 (f) is an integral membrane PNEP normally exported 
to the Maurer’s clefts. g, Quantification of export block by IFA for exported 
proteins shown here and in Fig. 2c-f. Export was scored as complete (all signal 
in the host cell), partial (signal within the host cell but also within the PV) or 
no export (signal only seen within the PV and not in the host cell). Example 
images of each scoring scenario are given for REX1 in Extended Data Fig. 5a. In 
the case of PfEMP1, cells were scored as having PfEMP1 signal at the RBC 
periphery (complete export) or not (no export) due to the fact that some 
PfEMP1 signal is always seen within the PV under normal export conditions 
(see Fig. 2e, + TMP). Error bars represent s.d. of three technical replicates. Data 
are representative of at least two independent experiments. h, IFA of PEGECO 
in paraformaldehyde-fixed, stage I gametocytes 36h post invasion with or 
without TMP. Pfs16 is a gametocyte-specific PVM marker. All IFA scale bars, 
5 um. All IFA images (a-d, f, h) are representative of two independent 
experiments. 
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Extended Data Figure 5 | Reactivation of export does not require new represent s.d. of three technical replicates. Data are representative of two 


protein synthesis. a, b, TMP was removed in late schizonts and parasites were independent experiments. Scale bar, 5 tum. c, Metabolic labelling with 
allowed to reinvade and develop for 18h before TMP add back with or [°°S]methionine/cysteine, performed as previously described“, confirms that 
without 10 pg ml ' cycloheximide (CHX). Parasites were acetone-fixed 24h CHX treatment conditions inhibit new protein synthesis. Parasite proteins 
later and processed for IFA. Export was scored as complete (no REX] retained | were TCA-precipitated and incorporated radioactivity was determined 
within the EXP2-labelled PVM), partial (REX1 in the host cell and retained through scintillation counting. Error bars represent s.d. of three technical 
within the PVM) or no export (no REX] signal beyond the PVM). Error bars __ replicates. 
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Extended Data Figure 6 | Maturation of plasmepsin 2 is not affected by 
inactivation of HSP101. Western blot showing normal maturation of 
plasmepsin 2 (PM2) in asynchronous parasites after 24h -TMP. BiP serves as a 
loading control. Maturation requires proP M2 trafficking through the PV before 
internalization to the digestive vacuole where maturation occurs”. Images are 
representative of two independent experiments. 
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Extended Data Figure 7 | DDD targeted to the PV independent of HSP101 
does not interfere with parasite growth or export. a, Schematic of the PV- 
targeted GFP-DDD fusion protein consisting of a signal peptide appended to a 
GFP-DDD fusion with a C-terminal HA epitope tag and expressed under the 
control of the HSP86 promoter. The predicted size of the fusion protein after 
signal peptide cleavage is 46 kDa. Live imaging of GFP demonstrates PV 
targeting of the fusion protein. Images are representative of two independent 
experiments. b, Western blot showing GEP”””” is not degraded in the absence 
of TMP. Synchronized GFP””” parasites were grown 72h with or without 
TMP before purification over Percoll and treatment with tetanolysin to release 
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the RBC cytosol but not the PV contents. The GEP””” fusion was detected with 
anti-HA antibodies. BiP serves as a loading control. Images represent one 
experiment. c, Growth analysis of asynchronous GEP””” parasites shows no 
growth defect —TMP. Error bars represent s.d. of three technical replicates. 
Data are representative of two independent experiments. d, IFA showing no 
defect in export in GFP?” parasites in the absence of TMP. No difference was 
observed in export of ring-specific (REX1) and the trophozoite-specific 
(MSRP6) exported proteins in the presence or absence of TMP. All scale bars, 
5 pum. Images represent one experiment. 
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Extended Data Figure 8 | Localization of PTEX components is unchanged 
under export blocking conditions. a, Immunoelectron microscopy showing 
localization of HSP101?”” in ring-stage parasites with or without TMP. TMP 
was removed or not from synchronous late schizonts and parasites were 
allowed to re-invade and develop for 18h before fixation. Scale bars, 500 nm. 
Images represent one experiment. b, c, IFA of 13F10 parasites showing 
co-localization between HSP101??” or EXP2 and the PV marker SERP with or 
without TMP, indicating that localization of these PTEX components is not 
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altered under export blocking conditions. TMP treatment was performed as in 
a before fixation with acetone and processing for IFA. d, e, IFA of ring-stage 
13F10 parasites expressing a PTEX150-Flag fusion. The upper panel (d) shows 
co-localization with HSP101??”. The lower two panels (e) show that PTEX150 
remains at the PVM during export block and partially colocalizes with 
blocked SBP1. TMP treatment and fixation as in b, c. All IFA scale bars, 5 um. 
All IFA images (b-e) are representative of two independent experiments. 
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Extended Data Figure 9 | Immunoprecipitation bead controls indicate 
target-specific interactions. Replicate IP experiments to those shown in 
Fig. 4c, e with bead controls included. a, b, Purification of RESA by HSP101PPP 
or EXP2 (a) and purification of EXP2 by PTEX150"""8 (b) is specific to the target 


of two independent experiments. 
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antibodies. IP experiments were performed in parallel by incubating equivalent 
portions of lysate supernatant input with beads alone or beads and the 
indicated IP antibodies. S, supernatant input. E, elution. Data are representative 
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Equalizing excitation-inhibition ratios across visual 


cortical neurons 


Mingshan Xue>?¥, Bassam V. Atallah? & Massimo Scanziani>?* 


The relationship between synaptic excitation and inhibition (E/I ratio), 
two opposing forces in the mammalian cerebral cortex, affects many 
cortical functions such as feature selectivity and gain’. Individual 
pyramidal cells show stable E/I ratios in time despite fluctuating cor- 
tical activity levels. This is because when excitation increases, inhibi- 
tion increases proportionally through the increased recruitment of 
inhibitory neurons, a phenomenon referred to as excitation-inhibition 
balance*°. However, little is known about the distribution of E/I ratios 
across pyramidal cells. Through their highly divergent axons, inhib- 
itory neurons indiscriminately contact most neighbouring pyramidal 
cells’®"’, Is inhibition homogeneously distributed” or is it individu- 
ally matched to the different amounts of excitation received by dis- 
tinct pyramidal cells? Here we discover that pyramidal cells in layer 
2/3 of mouse primary visual cortex each receive inhibition in a sim- 
ilar proportion to their excitation. As a consequence, E/I ratios are 
equalized across pyramidal cells. This matched inhibition is medi- 
ated by parvalbumin-expressing but not somatostatin-expressing 
inhibitory cells and results from the independent adjustment of 
synapses originating from individual parvalbumin-expressing cells 
targeting different pyramidal cells. Furthermore, this match is activity- 
dependent as it is disrupted by perturbing pyramidal cell activity. 
Thus, the equalization of E/I ratios across pyramidal cells reveals an 
unexpected degree of order in the spatial distribution of synaptic 
strengths and indicates that the relationship between the cortex’s two 
opposing forces is stabilized not only in time but also in space. 

To determine the distribution of E/I ratios among layer 2/3 neigh- 
bouring pyramidal cells (Fig. 1a), we used adeno-associated virus (AAV) 
to conditionally express channelrhodopsin-2 (ChR2)'*"* in Scnn1a- 
Cre-Tg3 mice and photoactivated layer 4 excitatory neurons, one of the 
main sources of synaptic excitation to layer 2/3, in acute visual cortical 
slices (Extended Data Fig. 1). We compared the E/I ratios between two 
to four simultaneously recorded layer 2/3 pyramidal cells (inter-soma 
distance 39.4 + 2.5 [1m, mean + s.e.m.; Extended Data Fig. 2) voltage 
clamped alternatively at the reversal potential for synaptic inhibition 
and excitation to isolate excitatory postsynaptic currents (EPSCs) and 
disynaptic inhibitory postsynaptic currents (IPSCs), respectively. EPSC 
amplitudes greatly varied between simultaneously recorded neurons 
and so did IPSC amplitudes (Fig. 1b). Despite the heterogeneous dis- 
tributions of EPSC and IPSC amplitudes among pyramidal cells, how- 
ever, we found a strong correlation between their amplitudes. That is, 
neurons with larger EPSCs also received larger IPSCs (Fig. Ic, e). Asa 
consequence, the distribution of E/I ratios across pyramidal cells var- 
ied much less than the distributions of EPSC and IPSC amplitudes 
(Fig. 1d, f) and much less than if EPSCs and IPSCs were randomly 
paired between cells (Extended Data Fig. 2). These data indicate that 
E/I ratios are equalized across pyramidal cells. 

This equalization could occur if distinct layer 2/3 pyramidal cells each 
receive inhibition from a ‘private’ set of inhibitory neurons such that the 
excitatory afferents that more strongly excite a pyramidal cell also more 


strongly excite its private inhibitory neurons. However, the two classes 
of inhibitory neurons, parvalbumin-expressing (Pvalb) and somatostatin- 
expressing (Sst) cells, that provide most inhibition to layer 2/3 pyramidal 
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Figure 1 | Equalized E/I ratios across pyramidal cells. a, Two alternative 
models for the spatial distribution of E/I ratios across pyramidal cells. Top, the 
divergent axons of inhibitory neurons (IN) homogenously inhibit 
neighbouring pyramidal cells. Pyramidal cells receiving more excitation have a 
larger E/I ratio. Bottom, despite divergent axons, inhibitory neurons generate 
larger inhibition in pyramidal cells receiving more excitation. Accordingly 
E/I ratios are equalized across pyramidal cells. b, Left, schematic of 
experiments. Scnn1a-Cre-Tg3 mice with ChR2 in layer 4 excitatory neurons. 
Right, monosynaptic EPSCs and disynaptic IPSCs from four simultaneously 
recorded layer 2/3 pyramidal cells (Pyr) in response to layer 4 photoactivation. 
Note larger IPSCs in neurons receiving larger EPSCs. c, EPSC amplitudes 

of the four neurons in b plotted against their IPSC amplitudes. Left, absolute 
amplitudes. Right, normalized amplitudes. EPSC (or IPSC) amplitudes are 
normalized by the mean of the simultaneously recorded EPSC (or IPSC) 
amplitudes. Lines, linear regression fits. d, Distributions of normalized EPSC 
and IPSC amplitudes, and of normalized E/I ratios for the experiment in b. E/I 
ratios are normalized by the mean of the simultaneously recorded ratios. Note 
narrower distribution of E/I ratios compared with EPSCs or IPSCs. For 
computing average relative deviations, see Methods. e, Summary graphs of 
normalized EPSCs and IPSCs from 20 similar experiments (n = 51 cells). Line, 
linear regression fit (R? = 0.65, P< 0.0001). f, Summary graphs of average 
relative deviations from 20 similar experiments. Bars, mean + s.e.m. The 
average relative deviations of E/I ratios are 50% smaller than those of EPSCs 
(P< 0.0001) or IPSCs (P< 0.0001). g, Left, schematic of experiments. Right, 
connectivity rates from Pvalb and Sst cells to pyramidal cells. 
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cells showed broad connectivity with pyramidal cells (97% and 93%, 
respectively, Fig. 1g), as previously shown", thus precluding the pri- 
vate connectivity. 

Alternatively, the correlation between excitation and inhibition could 
bean artefact of the slicing procedure, whereby damaged neurons receive 
less excitation and less inhibition. To address this possibility we used an 
independent marker to identify neurons receiving more excitation. We 
used mice in which the promoter of the activity-dependent immediate 
early gene Fos drives the expression of Fos fused to the enhanced green 
fluorescent protein (Fos-EGFP), because in these mice EGFP* neu- 
rons receive more excitation than EGFP neurons'’. EGFP” neurons 
were predominantly pyramidal cells (Extended Data Fig. 3). We photo- 
stimulated layer 4 in acute slices from Fos-EGFP, Scnn1a-Cre-Tg3 mice 
and simultaneously recorded pairs of EGEP* and nearby EGFP layer 
2/3 pyramidal cells. Layer 4 activation generated larger EPSCs in EGFP* 
neurons in 78% of all recorded pairs, and EGEP* neurons received, on 
average, 40% larger EPSCs (Fig. 2a, b) (the average logarithm of EGFP */ 
EGFP ratios was 0.15). Importantly, EGFP” neurons also received 
larger disynaptic IPSCs (Fig. 2a, c). Consequently, the E/I ratios of 
EGFP‘ and EGEP” neurons were similar (Fig. 2d). 

Taken together, these results demonstrate that excitation and inhi- 
bition, despite varying in amplitudes between pyramidal cells, remain 
proportional to each other, thus equalizing E/T ratios. 

Which type of interneuron provides the inhibition that matches layer- 
4-mediated excitation? We took advantage of the fact that EGFP" neu- 
rons in Fos~EGFP mice receive larger excitation from layer 4 and crossed 
them to Pvalb-ires-Cre or Sst-ires-Cre mice to express ChR2 condition- 
ally. Photoactivation of Pvalb cells generated larger monosynaptic IPSCs 
in EGFP* than in EGFP” neurons (Fig. 2e, f). In contrast, Sst cells gen- 
erated IPSCs whose amplitudes did not correlate with EGFP expression 
(Fig. 2g, h). These data indicate that Pvalb cells, but not Sst cells, provide 
stronger inhibition onto neurons that receive stronger layer-4-mediated 
excitation, thereby contributing to the equalization of E/I ratios. 

What mechanism regulates the strengths of excitation and/or inhi- 
bition to achieve the observed proportionality? Excitation and inhibi- 
tion may reach their specific ratio by using the pyramidal cell’s activity 
as a measure of their relative strengths. For example, the low activity 
caused by a strong Pvalb-cell-mediated inhibition or by a weak layer-4- 
mediated excitation could be the signal to increase layer-4-mediated 
excitation or to decrease Pvalb-cell-mediated inhibition, respectively, 
until a neuron’s specific higher set-point activity is reached. In both 
possibilities the initially small E/I ratio is increased by either increasing 
excitation to match the large inhibition or by decreasing inhibition to 
match the small excitation. Both possibilities are plausible since the 
activity of individual neurons can regulate the strengths of both excit- 
atory and inhibitory synapses'”°. If this hypothesis is correct, perturb- 
ing the activity of pyramidal cells should disrupt the proportionality 
between excitation and inhibition. For example, reducing the excitabil- 
ity of a pyramidal cell should increase its E/I ratio by either increasing 
excitation (the first possibility), or decreasing inhibition (the second pos- 
sibility), or both. 

We reduced the excitability of a small, random subset of layer 2/3 
pyramidal cells in primary visual cortex (V1) by overexpressing a Kir2.1 
channel via in utero electroporation (IUE)”’** (Fig. 3a). Recordings in 
acute slices confirmed the reduced excitability in Kir2.1-overexpressing 
cells (Kir2.1 neurons) compared with untransfected control pyramidal 
cells (Extended Data Fig. 4). In vivo targeted recordings from Kir2.1 
and nearby control neurons (Fig. 3b, c) demonstrated that Kir2.1 over- 
expression drastically suppressed visual-evoked and spontaneous activ- 
ity (Fig. 3d-f). We then examined the impact of this perturbation on 
excitation and inhibition. We photostimulated layer 4 and simultaneously 
recorded Kir2.1 and neighbouring control neurons in the acute slices 
from Scnn1la-Cre-Tg3 mice. Surprisingly, layer-4-mediated excitation 
was not significantly different between these two groups (Fig. 3g, h), 
invalidating the first aforementioned possibility. In contrast, disynap- 
tic inhibition was significantly smaller in Kir2.1 neurons (Fig. 3g, i), 
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Figure 2 | Pvalb-cell-mediated inhibition matches layer-4-mediated 
excitation. a, Left, schematic of experiments. Fos-EGFP, Scnn 1a-Cre-Tg3 mice 
with ChR2 in layer 4 excitatory neurons. Right, monosynaptic EPSCs and 
disynaptic IPSCs from simultaneously recorded EGFP and EGFP* neurons 
in response to layer 4 photoactivation. Note larger synaptic currents in EGEP* 
neuron. b-d, Summary graphs of 37 similar experiments. b, Left, EPSC 
amplitudes in EGFP™ neurons plotted against those in EGFP” neurons. Right, 
logarithm of the ratio between EPSC amplitudes in EGFP" and EGFP 
neurons. Red, mean + s.e.m. EPSC amplitudes are 40% larger in EGFP 
neurons (P = 0.0004). ¢, As in b, but for disynaptic IPSCs. Disynaptic IPSC 
amplitudes are 30% larger in EGEP* neurons (P = 0.001). d, As in b, but 

for E/I ratios. E/I ratios are similar between EGFP* and EGFP” neurons 

(P = 0.7). e, Left, schematic of experiments. Fos-EGFP, Pvalb-ires-Cre mice 
with ChR2 in Pvalb cells. Right, IPSCs from simultaneously recorded EGFP — 
and EGFP* neurons in response to Pvalb cell photoactivation. Note larger IPSC 
in EGFP* neuron. f, Summary graph. Left, IPSC amplitudes in EGFP* neurons 
plotted against those in EGFP neurons. Right, logarithm of the ratio between 
IPSC amplitudes in EGEP* and EGFP neurons. Red, mean + s.e.m. IPSC 
amplitudes are 77% larger in EGEP* neurons (n = 49, P = 0.001). g, h, As in 
e, f, but for Fos-EGFP, Sst-ires-Cre mice with ChR2 in Sst cells. 

IPSC average amplitudes are similar between EGFP* and EGFP” neurons 

(n = 27, P=0.7). 


consistent with the second possibility. The effect on inhibition was due 
to the channel function of Kir2.1 because a non-conducting Kir2.1 
mutant (Extended Data Fig. 4) had no effect (Extended Data Fig. 5). 
Thus, perturbing layer 2/3 pyramidal cell excitability disrupts the 
proportionality between excitation and inhibition (Fig. 3j). These data 
indicate that pyramidal cell activity contributes to the equalization of 
E/I ratios across pyramidal cells. 

If pyramidal cell activity contributes to establishing the proportion- 
ality between layer-4-mediated excitation and Pvalb-cell-mediated in- 
hibition, then the decrease in excitability should selectively decrease 
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Figure 3 | Suppressing pyramidal cell activity reduces inhibition but not 
excitation. a, Fluorescent image of a V1 coronal section showing Kir2.1-T2A- 
tdTomato overexpression in a small subset of layer 2/3 pyramidal cells (9 + 1%, 
mean ~ s.e.m., n = 12 sections from six mice). Cortical layers are identified by 
NeuN staining. L, layer, WM, white matter. b, Left, schematic of in vivo 
experiments. Right, a Kir2.1 neuron (upper panel) and a control neuron (lower 
panel) were sequentially recorded with Alexa Fluor 488-filled pipettes. 

c, Recordings from a control and a Kir2.1 neuron show spontaneous and visual- 
evoked spikes. Grey box, visual stimulation period. Note reduced spiking in 
Kir2.1 neuron. d-f, Cumulative frequencies of evoked spike rate (d, median: 
control, 0.50 Hz; Kir2.1, 0.061 Hz; P< 0.0001), spontaneous spike rate 

(e, median: control, 0.16 Hz; Kir2.1, 0.017 Hz; P< 0.0001) and overall spike 
rate (f, median: control, 0.25 Hz; Kir2.1, 0.043 Hz; P < 0.0001) from 38 control 
neurons and 37 Kir2.1 neurons. g, Left, schematic of slice experiments. Scnn1a- 
Cre-Tg3 mice with ChR2 in layer 4 excitatory neurons and Kir2.1 ina subset of 
layer 2/3 pyramidal cells. Right, monosynaptic EPSCs and disynaptic IPSCs 
from simultaneously recorded control and Kir2.1 neurons in response to layer 4 
photoactivation. Note similar EPSC but smaller disynaptic IPSC in Kir2.1 
neuron compared with control neuron. h-j, Summary graphs. h, Left, EPSC 
amplitudes in Kir2.1 neurons plotted against those in control neurons. Right, 
logarithm of the ratio between EPSC amplitudes in Kir2.1 and control neurons. 
Red, mean + s.e.m. EPSC average amplitudes are similar between Kir2.1 

and control neurons (n = 25, P = 0.8). i, As in h, but for disynaptic IPSCs. 
Disynaptic IPSC amplitudes in Kir2.1 neurons are 27% of those in control 
neurons (” = 18, P = 0.0003). j, As in h, but for E/I ratios. E/I ratios in Kir2.1 
neurons are threefold those in control neurons (n = 18, P = 0.004). 


Pvalb- but not Sst-cell-mediated inhibition. Conversely, an increase in 
excitability should selectively increase Pvalb-cell-mediated inhibition. 
Indeed, Pvalb-cell-mediated inhibition was significantly smaller in Kir2.1 
than in control neurons, whereas Sst-cell-mediated inhibition was sim- 
ilar (Fig. 4a-e). Overexpression of the non-conducting Kir2.1 mutant 
did not affect Pvalb-cell-mediated inhibition (Extended Data Fig. 5). 
We used a bacterial voltage-gated Na‘ channel (mNaChBac) to en- 
hance neuronal excitability. Neurons expressing mNaChBac generate 
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Figure 4 | Bidirectional regulation of Pvalb- but not Sst-cell-mediated 
inhibition. a, Schematic of chronic and acute expression of Kir2.1. Red bars, 
approximate Kir2.1 expression time course. b, Left, schematic of experiments. 
Pvalb-ires-Cre mice with ChR2 in Pvalb cells and Kir2.1 in a subset of layer 2/3 
pyramidal cells. Right, IPSCs from simultaneously recorded control neuron 
and neuron chronically or acutely expressing Kir2.1 in response to Pvalb cell 
photoactivation. Note smaller IPSCs in Kir2.1 neuron. c, Summary graphs. Left, 
IPSC amplitudes in Kir2.1 neurons plotted against those in control neurons. 
Right, logarithm of the ratio between IPSC amplitudes in Kir2.1 and control 
neurons. Red, mean = s.e.m. IPSC amplitudes in Kir2.1 neurons are 23% 

(n = 36, P< 0.0001) and 31% (n = 16, P = 0.0005) of those in control neurons 
for chronic and acute conditions, respectively. d, e, As in b, c, but for Sst-ires- 
Cre mice with ChR2 in Sst cells and Kir2.1 chronically in a subset of layer 2/3 
pyramidal cells. On average IPSC amplitudes are similar between Kir2.1 

and control neurons (n = 26, P = 0.3). f, Schematic of chronic and acute 
expression of mNaChBac. Magenta bars, approximate mNaChBac expression 
time course. g, h, As in b, c, but for mNaChBac. IPSC amplitudes in mNaChBac 
neurons are 2.7-fold (n = 18, P= 0.001) and 2.2-fold (n = 24, P = 0.0003) 
those in control neurons for chronic and acute conditions, respectively. 

i, j, As in d, e, but for mNaChBac. On average, IPSC amplitudes are similar 
between mNaChBac and control neurons (n = 17, P = 0.7). 
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long-lasting action potentials and depolarization of the order of hun- 
dreds of milliseconds (Extended Data Fig. 6). Because constitutive ex- 
pression of mNaChBac in cortical neurons from embryonic day 15.5 
(E15.5) caused a neuronal migration defect (Extended Data Fig. 7), we 
devised a Flpo recombinase-mediated flip-excision strategy, F-FLEX 
switch (Extended Data Fig. 8), to conditionally express mNaChBac 
postnatally. We combined in utero electroporation of a Flpo-dependent 
mNaChBac-expressing plasmid, to randomly transfect a small subset 
of layer 2/3 pyramidal cells, with injection of an AAV expressing Flpo 
at postnatal day 1 (P1), to turn on mNaChBac expression. This allowed 
us to concurrently express ChR2 in Pvalb or Sst cells, and mNaChBac 
in layer 2/3 pyramidal cells without affecting their migration (Extended 
Data Fig. 7). Pvalb-cell-mediated inhibition was significantly larger in 
mNaChBac neurons than in control neurons (Fig. 4f-h), and a non- 
conducting mNaChBac mutant (Extended Data Fig. 6) had no effect 
(Extended Data Fig. 5). mNaChBac expression did not alter Sst-cell- 
mediated inhibition (Fig. 4i, j). To determine whether also more acute 
perturbations of layer 2/3 pyramidal cell excitability alter Pvalb-cell- 
mediated inhibition, we used Flpo and F-FLEX switch to express Kir2.1 
or mNaChBac for only approximately 1 week starting around postnatal 
days 12-14. This acute decrease (Kir2.1) or increase (mNaChBac) in 
excitability caused a decrease or an increase in Pvalb-cell-mediated in- 
hibition, respectively, similar to the changes caused by the chronic ex- 
pression of Kir2.1 or mNaChBac (Fig. 4a-c, f-h). These data indicate 
that the proportionality between layer-4-mediated excitation and Pvalb- 
cell-mediated inhibition is equalized across pyramidal cells through 
the bidirectional modulation of the strength of Pvalb cell synapses. 
The above results show that the spatial heterogeneity of Pvalb-cell- 
mediated inhibition ensures the equalization of E/I ratios across pyramidal 
cells. Is the inhibition mediated by a single Pvalb cell also heterogen- 
eous across its targeted pyramidal cells? We first determined whether 
the relative amplitudes of unitary IPSCs (uIPSCs) mediated by a Pvalb 
cell onto its targets are predicted by the relative activity of these targets. 
Wesuppressed the activity of a small subset of layer 2/3 pyramidal cells 
by overexpressing Kir2.1 and simultaneously recorded from a layer 2/3 
Pvalb cell, a control and a Kir2.1 neuron (Fig. 5a). Although the Pvalb- 
to-pyramidal cell connectivity was similarly high, regardless of whether 
pyramidal cells overexpressed Kir2.1 (Fig. 5f), uIPSC amplitude was sig- 
nificantly smaller in Kir2.1 neurons than in control neurons (Fig. 5b, f). 
All properties of the unitary connections between Pvalb cells and con- 
trol neurons were similar to those recorded in mice that were not trans- 
fected with Kir2.1 (Extended Data Fig. 9), indicating a cell-autonomous 
effect of Kir2.1 overexpression. We assessed the variability of uIPSC 
amplitudes originating from a single Pvalb cell and determined its depen- 
dency on the activity of the targeted pyramidal cells. We simultaneously 
recorded from a layer 2/3 Pvalb cell and two or three nearby pyramidal 
cells that were either all control or all Kir2.1 neurons (Fig. 5c, d). uIPSC 
amplitudes varied greatly from one control neuron to another, but less 
among Kir2.1 neurons (Fig. 5c-e, g), possibly because suppressing pyr- 
amidal cell activity cannot reduce uIPSC amplitudes below a certain level 
(flooring effect) (Extended Data Fig. 10). Thus, the inhibition generated 
by even an individual Pvalb cell onto its targets is remarkably hetero- 
geneous, and this heterogeneity reflects in part the activity profile of the 
targeted pyramidal cell population. Hence, despite the indiscriminate 
connectivity of Pvalb cells, the amount of inhibition that they provide 
onto each of their targets is adjusted to equalize the E/I ratios (Fig. 5h). 
Both theoretical and experimental evidence indicates that the rela- 
tionship between synaptic excitation and inhibition in the cerebral cor- 
tex is fundamental for sensory processing’***’’. Failure to establish or 
maintain this relationship may be the neural basis of neurological dis- 
orders such as schizophrenia and autism”*”’. We discover that E/I ratios 
are remarkably similar across different pyramidal cells despite large 
variations in the amplitudes of synaptic excitation and inhibition. The 
activity-dependent adjustment of inhibition to match excitation may 
result from activity-dependent gene expression’””®”°. Our study pro- 
vides insight into how two opposing synaptic inputs, layer-4-mediated 


LETTER 


b 
10384 7=12 
somv; <= 
I sms] 
Pvalb 2 
5 10 
a ail 
@ 1 ° 
Oo 10 23 150 
D =3 Odi; 
Control 100 pA| = & 


Kir2.1 _[— 


Kir2.1 


Control 


nm Contest f Control (n = 109) 

oie pus Kir2.1 (n = 114) 
S 1004 Connectivity (%) 
& 0 50 100 
cs 80 
2 
g 60 _. 400 
ae =A 
is 40 g 3 200 
2 20 oe. lr 
=} oO 
16) (0) “ 


4 10 100 1,000 
ulPSC amplitude (pA) 


0.8 


Average relative deviation @ 


300 pA| 0.4 
Pyr 4 Of ee Pyr1—~~ I 
Pyr 2 Pyr2—“ 0 B42 
Pyr3—S 0 Pyr 3 Control —Kir2.1 
e Control Kir2.1 h 


Equalization of E/ ratio 


Layer 4 
excitation 


3 . 
Figure 5 | Inhibition mediated by individual Pvalb cells varies depending on 
targets’ activity. a, Left, schematic of experiments. Right, ulPSCs from 
simultaneously recorded control and Kir2.1 neuron in response to an action 
potential in a Pvalb cell. Note smaller uIPSC in Kir2.1 neuron. b, Summary 
graphs of uIPSC amplitudes from 12 similar experiments (P = 0.005). 

Inset, mean + s.e.m. c, Top, schematic of experiments. Bottom, uIPSCs 
simultaneously recorded from three control neurons in response to an action 
potential in a Pvalb cell. d, As in c, but for three Kir2.1 neurons. e, The uIPSCs 
from c and d were normalized by their respective mean amplitudes (scaled 
average). Note larger inter-cell variability of uIPSCs among control neurons. 
f, Cumulative frequencies of uIPSC amplitudes (control: n = 109, median, 
205.4 pA; Kir2.1: n = 114, 57.4pA; P< 0.0001). Lower inset, mean + s.e.m. of 
ulIPSC amplitudes. Upper inset, unitary connectivity rates from Pvalb cells to 
control (109 out of 116) and Kir2.1 (114 out of 126) neurons are similar 

(P = 0.3). g, Summary graph for the average relative deviations of uIPSCs from 
37 and 42 experiments as in c and d. Bars, mean + s.e.m. The average relative 
deviations for Kir2.1 neurons are 33% smaller than those for control neurons 
(P = 0.009). h, Schematic of the equalized E/I ratios across cortical neurons. 


Scaled average 


excitation and Pvalb-cell-mediated inhibition, remain proportional across 
a population of pyramidal cells. Thus it reveals an unanticipated degree 
of order in the distribution of synaptic strengths in cortical space. 


METHODS SUMMARY 


Cre-dependent recombinant AAV vectors were injected postnatally into Cre- 
expressing mice to conditionally express ChR2. Plasmids were electroporated in 
utero at embryonic day 14.5 or 15.5 to transfect a small, random subset of layer 2/3 
pyramidal cells. For in vitro physiology, mice at postnatal days 14-23 were anaes- 
thetized and transcardially perfused. Coronal slices were perfused with artificial 
cerebrospinal fluid for whole-cell recordings at 31-32 °C. A light-emitting diode 
(470 nm) was used to deliver blue light to stimulate neurons via the activation of 
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ChR2. For in vivo physiology, mice at postnatal days 17-23 were anaesthetized by 
intraperitoneal injection of chlorprothixene (5mgkg ') followed by urethane 
(1.2¢kg~") and the body temperature was maintained at 37°C. A craniotomy at 
V1 was performed and targeted loose-patch recordings were performed under the 
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METHODS 


Mice. All procedures to maintain and use mice were approved by the Institutional 
Animal Care and Use Committee at the University of California, San Diego. Mice 
were maintained on a reverse 12-h:12-h light:dark cycle with regular mouse chow 
and water ad libitum. CD-1 mice were purchased from Charles River Laboratories 
or Harlan Laboratories. Scnn1a-Cre-Tg3 (ref. 31), Fos-EGFP**, Gad2-ires-Cre”’, 
Pvalb-ires-Cre*, Sst-ires-Cre** and Rosa-CAG-LSL-tdTomato— WPRE*' mice were 
obtained from the Jackson Laboratory (stock numbers 009613, 014135, 010802, 
008069, 013044 and 007909, respectively). Hemizygous transgenic mice and het- 
erozygous knock-in mice of both sexes were used in the experiments. 

DNA constructs and transfection of HEK cells. Two point mutations E224G 
and Y242F were introduced into mouse wild-type Kir2.1 (Kcnj2) to enhance its 
ability to suppress neuronal activity. Mutation E224G attenuates the Mg** and 
polyamine block of Kir2.1 to reduce its inward rectification’. Mutation Y242F 
blocks tyrosine kinase phosphorylation of Kir2.1 at residue Y242 to enhance its 
plasma membrane surface expression”. Three point mutations, G144A, Y145A 
and G146A, were introduced to generate a non-conducting channel*’. A Myc tag 
(EQKLISEEDL) was fused to the amino (N) termini of Kir2.1 E224G Y242F and 
Kir2.1 E224G Y242F G144A Y145A G146A, referred to as Kir2.1 and Kir2.1Mut, 
respectively. Both Kir2.1 and Kir2.1 Mut were carboxy (C)-terminally fused with a 
T2A sequence (GSGEGRGSLLTCGDVEENPGP) followed by a tdTomato. The result- 
ing constructs were then cloned into a plasmid containing a CAG promoter (pCAG) 
to generate pCAG-Kir2.1-T2A-tdTomato and pCAG-Kir2.1Mut-T2A-tdTomato. 

The complementary DNA (cDNA) encoding a wild-type bacterial Na* channel 
NaChBac** was synthesized de novo and codon-optimized for mammalian expres- 
sion (referred as mNaChBac) by DNA2.0. A point mutation E191K was intro- 
duced to generate a non-conducting channel”, referred as mNaChBacMut. Both 
mNaChBac and mNaChBacMut were C-terminally fused with T2A-tdTomato and 
cloned into the pCAG plasmid to create pCAG-mNaChBac-T2A-tdTomato and 
pCAG-mNaChBacMut-T2A-tdTomato, respectively. 

F-FLEX cassette using two wild-type Frt sites and two F14 sites" (Extended Data 
Fig. 8) was synthesized de novo and cloned into the plasmid pJ244 by DNA2.0 to 
generate pJ244-F-FLEX. mNaChBac-T2A-tdTomato and mNaChBacMut-T2A- 
tdTomato were subcloned into pJ244-F-FLEX in the inverted orientation. F-FLEX- 
mNaChBac-T2A-tdTomato and F-FLEX-mNaChBacMut-T2A-tdTomato cassettes 
were then subcloned into an AAV cis-plasmid containing an EFla promoter to 
generate pAAV-EF1a-F-FLEX-mNaChBac-T2A-tdTomato and pAAV-EFlo-F- 
FLEX-mNaChBacMut-T2A-tdTomato, respectively. mNaChBac-T2A-tdTomato 
in pAAV-EFla-F-FLEX-mNaChBac-T2A-tdTomato was replaced with inverted 
Kir2.1-T2A-tdTomato to generate pAAV-EF 1a-F-FLEX- Kir2.1-T2A-tdTomato. 

An improved version of Flp recombinase, Flpo”' was cloned into a pCAG plasmid 
and an AAV cis-plasmid containing a human synapsin promoter to generate pCAG- 
Flpo and pAAV-hSynapsin-Flpo, respectively. pCAG-EGFP”, pCAG-mREP” and 
pCAG-Cre“ were obtained from Addgene (11150, 28311 and 13775, respectively). 

HEK-293FT cells (Life Technologies) were transfected with DNA constructs 
(0.1-0.5 jig) in 12-well plates using Lipofectamine 2000 (Life Technologies) to test 
their functionality. The following constructs were used in Extended Data Fig. 8c, d: 
pCAG-EGFP, pCAG-mRFP, pCAG-Flpo, pCAG-Cre, pAAV-EF1a-F-FLEX- 
mNaChBac-T2A-tdTomato and pAAV-EF1la-DIO-hChR2(H134R)-EYFP. 

In utero electroporation. Female CD-1 mice were crossed with male Scnn1a-Cre- 
Tg3, Pvalb-ires-Cre, or Sst-ires-Cre mice to obtain timed pregnancies. pCAG-Kir2.1- 
T2A-tdTomato, pCAG-Kir2.1Mut-T2A -tdTomato, pCAG-mNaChBac-T2A-tdTomato, 
and pCAG-mNaChBacMut-T2A-tdTomato were used at the final concentrations 
of 2-3 pg pl’. pAAV-EFlo-F-FLEX-mNaChBac-T2A-tdTomato, pAAV-EF1a-F- 
FLEX-mNaChBacMut-T2A-tdTomato or pAAV-EF1o-F-FLEX-Kir2.1-T2A-tdTomato 
(2-3 pg pl! final concentration) was mixed with pCAG-EGFP (0.2 1g pl! final 
concentration). Fast Green (Sigma-Aldrich, 0.01% final concentration) was added 
to the DNA solution. On embryonic day 14.5 or 15.5, female mice were anaes- 
thetized with 2.5% isoflurane in oxygen at a flow rate of 11 min ' and the body 
temperature was maintained by a feedback-based DC temperature control system 
(FHC) at 37 °C. Buprenorphine (3 jg, Reckitt Benckiser Healthcare) was admini- 
stered subcutaneously along with 1 ml of Lactated Ringer’s Injection (Baxter Health- 
care). The abdominal fur was shaved and the skin was cleaned with 70% alcohol 
and iodine. A sterile towel drape was laid on the mouse with only the abdomen 
exposed. Midline incisions (2 cm) were made on the abdominal skin and wall, and 
the uteri were taken out of the abdominal cavity. A bevelled glass micropipette (tip 
size 100-j1m outer diameter, 50-|um inner diameter) was used to penetrate the uterus 
and the embryo skull to inject about 1.5 j1l of DNA solution into one lateral ventricle. 
Five pulses of current (voltage 39 V, duration 50 ms) were delivered at 1 Hz with a 
Tweezertrode (5-mm diameter) and a square-wave pulse generator (ECM 830, BTX 
Harvard Apparatus). The electrode paddles were positioned along the 70° angle to 
the brain’s sagittal plane. The cathode faced the occipital side of the injected ven- 
tricle to target the visual cortex. After electroporation, uteri were put back into the 
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abdominal cavity, and the abdominal wall and skin were sutured. Mice were returned 
to the home cage and recovered from anaesthesia on a 37 °C Deltaphase Isothermal 
Pad (Braintree Scientific). Additional buprenorphine (3 ,1g) was administered sub- 
cutaneously on the next day. After birth, transfected pups were identified by the 
transcranial fluorescence of tdTomato or EGFP with a stereomicroscope (MVX10 
Macroview, Olympus). Only the pups in which the majority of the transfection occurred 
in the primary visual cortex were used for experiments. 

AAV production and injection. All recombinant AAV vectors were produced 
by the Penn Vector Core with the following titres: AAV2/9-CAGGS-Flex-ChR2- 
tdTomato* (Addgene 18917, titre 1.15 x 10° genome copies per millilitre), AAV2/1- 
CAGGS-Flex-ChR2-tdTomato (titre 6.86 X 10'* or 1.22 X 10'* genome copies 
per millilitre), AAV2/9-EF la-DIO-hChR2(H134R)-EYFP** (Addgene 20298, titre 
6.24 X 10’ or 1.18 X 10'* genome copies per millilitre), AAV2/1-EF10-DIO-hChR2 
(H134R)-EYFP (titre 3.41 X 10° genome copies per millilitre) and AAV2/9- 
hSynapsin-Flpo (titre 1.57 X 10'* genome copies per millilitre). 

Injection at postnatal days 0-2. AAV was injected into the V1 of pups using a 
Nanoject II nanolitre injector (Drummond Scientific Company). Pups were anaes- 
thetized by hypothermia and secured on a custom-made plate. Fast Green (0.01% 
final concentration) was added to the virus solution for visualization. A bevelled 
glass micropipette (tip size 60-1m outer diameter, 30-1m inner diameter) was 
used to penetrate the scalp and skull, and to inject AAV at different depths (600, 
500, 400 and 300 ttm below the scalp) of one location (1.6 mm lateral and 0.3 mm 
caudal from the lambda). A total of about 80-180 nl (adjusted based on the virus 
titres) of virus solution was injected over 60 s. After injection, the micropipette was 
kept in the parenchyma at 300-1m depth for 30s before being slowly withdrawn. 
Pups were placed on a 37 °C Deltaphase Isothermal Pad to recover from anaes- 
thesia and then were returned to the dam. For chronic expression of mNaChBac or 
mNaChBacMut, a mix of AAV2/9-hSynapsin-Flpo (titre 1.57 X 10° genome copies 
per millilitre) and AAV2/9-EFlo-DIO-hChR2(H134R)-EYFP (titre 1.18 X 10° 
genome copies per millilitre) at a ratio of 1:3 was injected on postnatal day 1. 

Injection at postnatal days 12-14. For acute expression of Kir2.1 or mNaChBac, 
mice previously injected with AAV2/1-EF1o-DIO-hChR2(H134R)-EYFP between 
postnatal days 0 and 2 were injected with AAV2/9-hSynapsin-Flpo between post- 
natal days 12 and 14. Mice were anaesthetized with 2.5% isoflurane in oxygen at a 
flow rate of 11 min‘ and the body temperature was maintained by a feedback- 
based DC temperature control system at 37 °C. Buprenorphine (1 jig) was admini- 
stered subcutaneously along with 0.1 ml of Lactated Ringer’s Injection. Lubricant 
ophthalmic ointment (Artificial Tears Ointment, Rugby Laboratories) was applied 
to the corneas to prevent drying. The scalp fur was shaved and the skin was cleaned 
with 70% alcohol and iodine. A small incision (0.5 cm) was made on the skin medial 
to the visual cortex. The skull at the injection site (2.5 mm lateral to the midline and 
1 mm rostral to the lambda suture; the same site that was previously electroporated 
in utero and virally injected between postnatal days 0 and 2) was thinned with a 
0.3-mm diameter round bur (Busch Bur, Gesswein) on a high-speed rotary micro- 
motor (Foredom) such that the injection glass micropipette (tip size 50-j1m outer 
diameter, 25-|1m inner diameter) could penetrate the skull. A total of 150 nl of virus 
solution was injected 450 jm below the skull at a rate of 20nl min‘ using an 
UltraMicroPump III and a Micro4 controller (World Precision Instruments). After 
the injection, the micropipette was kept in the parenchyma for 5-10 min before 
being slowly withdrawn. The skin was sutured. Mice were returned to their home 
cage to recover from anaesthesia on a 37 °C Deltaphase Isothermal Pad. 
Immunocytochemistry. Mice were anaesthetized by an intraperitoneal injection 
of a ketamine and xylazine mix (100 mgkg”' and 10 mgkg"', respectively), and 
were transcardially perfused with phosphate buffered saline (PBS, pH 7.4) followed 
by 4% paraformaldehyde in PBS (pH 7.4). Brains were removed, further fixed 
overnight in 4% paraformaldehyde, cryoprotected with 30% sucrose in PBS and 
frozen in optimum cutting-temperature medium until sectioning. A HM 450 Slid- 
ing Microtome (Thermo Scientific) was used to section the brains to obtain 30-50-m 
coronal slices. Slices were blocked with 1% bovine serum albumin, 2% normal goat 
serum and 0.3% TritonX-100 in PBS at room temperature for 1 h and incubated 
with primary antibodies in working buffer (0.1% bovine serum albumin, 0.2% nor- 
mal goat serum, 0.3% TritonX-100 in PBS) at 4°C overnight. The following prim- 
ary antibodies were used: rabbit anti-RFP (1:200, Abcam), rat anti-RFP (1:300, 
Chromotek), chicken anti-GFP (1:500, Aves Labs), rabbit anti-GFP (1:2000, Life 
Technologies) and mouse anti-NeuN (1:200, Millipore). The slices were washed 
four times with working buffer for 10 min each, incubated with secondary antibodies 
conjugated with Alexa Fluor 488, 594 or 647 (1:500 or 1:1,000, Life Technologies) 
in working buffer for 1 h at room temperature, and then washed four times with 
working buffer for 10 min each. NeuroTrace 435/455 blue fluorescent Nissl stain 
(1:200, Life Technologies) was used to label neurons after antibody staining. Slices 
were mounted in Vectashield Mounting Medium containing 4’ ,6-diamidino-2- 
phenylindole (DAPI) (Vector Laboratories) or ProLong Gold antifade reagent (Life 
Technologies). Images were acquired on an Olympus FV1000 Confocal, a Zeiss 
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Axio Imager Al or an Olympus MVX10 Macroview, and processed using National 
Institutes of Health ImageJ. To estimate the fraction of layer 2/3 pyramidal cells 
that were transfected by in utero electroporation (Fig. 3a), transfected neurons 
(tdTomato+) and total neurons (NeuN+) in layer 2/3 were visually quantified. 
Assuming that 13.2% of layer 2/3 neurons are inhibitory interneurons (Extended 
Data Fig. 3), we estimated that 9 + 1% (mean + s.e.m., m = 12 sections from six 
mice) of layer 2/3 pyramidal cells were transfected. 

In vitro physiology. Mice between postnatal days 14 and 23 were anaesthetized 
by an intraperitoneal injection of a ketamine and xylazine mix (100 mgkg' and 
10mgkg ’, respectively), and transcardially perfused with cold (0-4 °C) slice cut- 
ting solution containing 80mM NaCl, 2.5mM KCl, 1.3mM NaH,PO,, 26mM 
NaHCO;, 4mM MgCh, 0.5mM CaCl:, 20mM p-glucose, 75 mM sucrose and 
0.5 mM sodium ascorbate (315 mosmol, pH 7.4, saturated with 95% O3/5% CO,). 
Brains were removed and sectioned in the cutting solution with a Super Micro- 
slicer Zerol (D.S.K.) to obtain 300-t1m coronal slices. Slices were incubated in a 
custom-made interface holding chamber saturated with 95% O2/5% CO) at 34°C 
for 30 min and then at room temperature for 20 min to 8h until they were trans- 
ferred to the recording chamber. 

Recordings were performed on submerged slices in artificial cerebrospinal fluid 
(ACSF) containing 119 mM NaCl, 2.5 mM KCl, 1.3 mM NaH3PO,, 26 mM NaHCOs, 
1.3mM MgCh, 2.5mM CaCl, 20mM p-glucose and 0.5 mM sodium ascorbate 
(300 mosmol, pH 7.4, saturated with 95% O2/5% COs, perfused at 3 ml min ') at 
31-32 °C. For whole-cell recordings, we used a K* -based pipette solution contain- 
ing 142 mM K*-gluconate, 10 mM HEPES, 1 mM EGTA, 2.5mM MgCh, 4mM 
ATP-Mg, 0.3 mM GTP-Na, 10 mM Nap-phosphocreatine (295 mosmol, pH 7.35) 
or a Cs*-based pipette solution containing 115mM Cs*-methanesulphonate, 
10mM HEPES, 1mM EGTA, 1.5mM MgCl, 4mM ATP-Mg, 0.3mM GTP- 
Na, 10 mM Na,-phosphocreatine, 2 mM QX 314-Cl, 10 mM BAPTA-tetracesium 
(295 mosmol, pH 7.35). Membrane potentials were not corrected for liquid junc- 
tion potential (experimentally measured as 11.4 mV for the K* -based pipette solu- 
tion and 8.4mV for the Cs*-based pipette solution). 

Neurons were visualized with video-assisted infrared differential interference con- 
trast imaging and fluorescent neurons were identified by epifluorescence imaging 
under a water immersion objective (40, 0.8 numerical aperture) on an upright 
Olympus BX51WI microscope with an infrared CCD camera (VX44, Till Photonics). 
For Fos-EGFP experiments, in a given field-of-view those pyramidal cells with the 
strongest EGFP fluorescence were visually identified as the EGFP+ neurons. The 
EGFP— neurons were those pyramidal cells whose fluorescence was equal to the 
background fluorescence level of the slices. 

Data were low-pass filtered at 4 kHz and acquired at 10 kHz with an Axon Mul- 
ticlamp 700A or 700B amplifier and an Axon Digidata 1440A Data Acquisition 
System under the control of Clampex 10.2 (Molecular Devices). Data were ana- 
lysed offline using AxoGraph X (AxoGraph Scientific). 

For the photostimulation of ChR2-expressing neurons, blue light was emitted 
from a collimated light-emitting diode (470 nm) driven by a T-Cube LED Driver 
(Thorlabs) under the control of an Axon Digidata 1440A Data Acquisition System 
and Clampex 10.2. Light was delivered through the reflected light fluorescence 
illuminator port and the X40 objective. 

Synaptic currents were recorded in the whole-cell voltage clamp mode with the 
Cs*-based patch pipette solution. Only recordings with series resistance below 
20 MQ were included. EPSCs and IPSCs were recorded at the reversal potential for 
IPSCs (—60 mV) and EPSCs (+10 mV), respectively. For light pulse stimulation, 
pulse duration (0.5-5 ms) and intensity (1.1-5.5mW mm ”) were adjusted for 
each recording to evoke small (to minimize voltage-clamp errors; see the figures 
for the ranges) but reliable monosynaptic EPSCs or IPSCs. Disynaptic IPSCs were 
evoked using the same light pulses that were used for evoking the corresponding 
monosynaptic EPSCs. Light pulses were delivered at 30-s interstimulus intervals. 

To quantify the inter-cell variability of EPSCs (Fig. 1d, f), we used the average 


|EPSC; — EPSCinean|, Where N is 


relative deviation defined as — as, 
Nx EPSCinean 
the number of pyramidal cells in one given experiment, EPSC; is the amplitude of 
the EPSC recorded in the i* pyramidal cell within that experiment and EPSCynean 
is the mean amplitude of EPSCs recorded across pyramidal cells in the same 
experiment. The average relative deviation of IPSCs or E/I ratios was obtained 
in the same way for each experiment (Fig. 1d, f). 

To record unitary connections between inhibitory neurons and pyramidal cells, 
Pvalb and Sst cells were identified by the Cre-dependent expression of ChR2- 
tdTomato or hChR2(H134R)-EYFP in Pvalb-ires-Cre and Sst-ires-Cre mice, res- 
pectively. Pyramidal cells were first recorded in whole-cell voltage clamp mode 
(+10 mV) with the Cs*-based patch pipette solution, and a nearby Pvalb or Sst 
cell was subsequently recorded in the whole-cell current clamp mode with the K* - 
based patch pipette solution. Action potentials were elicited in Pvalb or Sst cells 
by a 2-ms depolarizing current step (1-2 nA) with a 15-s interstimulus intervals. 


Unitary IPSC (uIPSC) amplitudes were measured from the average of 10-50 
sweeps. We considered a Pvalb or Sst cell to be connected with a pyramidal cell 
when the average uIPSC amplitude was at least three times the baseline standard 
deviation. The average relative deviation of ulPSC amplitudes (Fig. 5 and Extended 


Data Fig. 9) was calculated as NT -y |uIPSC; — uIPSCmean| where N is 


x ulPSCiean 4 
the number of pyramidal cells in one given experiment, uIPSC; is the amplitude of 
the IPSC recorded in the i* pyramidal cell within that experiment and uIPSCinean is 
the mean amplitude of ulPSCs recorded across pyramidal cells in the same experiment. 

Neuronal intrinsic excitability was examined with the K* -based pipette solu- 
tion in the presence of the AMPA receptor antagonist NBQX (10 UM), the NMDA 
receptor antagonist (RS)-CPP (101M) and the GABA, (y-aminobutyric acid) 
receptor antagonist SR 95531 (10 1M). The resting membrane potential was recorded 
in the whole-cell current clamp mode within the first minute after break-in. The input 
resistance was measured after balancing the bridge by injecting a 500-ms-long hyper- 
polarizing current pulse (10-100 pA) to generate a small membrane potential hyper- 
polarization (2-10 mV) from the resting membrane potential. Depolarizing currents 
were increased in 5- or 10-pA steps to identify rheobase currents. 

Ba’ -sensitive currents were measured with the K* -based pipette solution in 

the presence of NBQX (10 1M), (RS)-CPP (10 1M), SR 95531 (10 1M) and Na* 
channel blocker TTX (1 1M). Only recordings with series resistance below 20 MQ 
were included. Neurons were clamped at —25 mV and the membrane potential 
was ramped to — 125 mV ata rate of 20mV ss‘. The membrane currents recorded 
in the presence of BaC], (50 11M) were subtracted from those recorded in the absence 
of BaCl, to obtain the Ba’ * -sensitive currents, which were then divided by the whole- 
cell membrane capacitances to calculate the current densities. 
In vivo physiology. Mice between postnatal days 17 and 23 were anaesthetized by 
an intraperitoneal injection of chlorprothixene (5 mgkg~') followed by urethane 
(1.2. gkg *). Oxygen was given at a flow rate of 11min’ * during the experiments 
and isoflurane (<0.5%) was supplemented if necessary. The body temperature was 
maintained by a feedback-based DC temperature control system at 37 °C. Dexa- 
methasone sodium phosphate (2 mg kg” ') and Lactated Ringer's Injection (3 ml kg" 
every 2 h) were administered subcutaneously. Whiskers and eyelashes were trimmed, 
and a thin layer of silicone oil (kinematic viscosity 30,000 centistokes (1 cSt = 
10°-°m’s_'), Sigma-Aldrich) was applied to the corneas to prevent drying. The 
scalp and periosteum were removed. Vetbond tissue adhesive (3M) was applied to 
stabilize all sutures. A custom-made recording chamber with a 3-mm diameter 
hole in the centre was attached to the skull over V1 with Vetbond tissue adhesive 
and dental cement (Ortho-Jet BCA, Lang Dental). The recording chamber was 
then secured on a custom-made holder. A craniotomy (1.5-2 mm diameter, cen- 
tred at 2.5 mm lateral to midline and 1 mm rostral to lambda suture) was performed 
with a 0.3-mm diameter round bur on a high-speed rotary micromotor. The dura 
was left intact and the craniotomy was covered by a thin layer of 1.5% type III-A 
agarose in HEPES-ACSF containing 142 mM NaCl, 5 mM KC], 10 HEPES-Na, 1.3 mM 
MgCl, 3.1 mM CaCl, and 10 mM p-glucose (310 mosmol, pH 7.4). HEPES-ACSF 
was added to the recording chamber. 

Targeted loose-patch recordings were performed under the guidance of a two- 
photon laser scanning microscope. Two-photon imaging was performed with a 
water immersion objective (X40, 0.8 numerical aperture, Olympus) on a Moveable 
Objective Microscope (Sutter Instruments) coupled with a Ti:Saphire laser (Cha- 
meleon Ultra II, Coherent) under the control of ScanImage 3.6 (Janelia Farm 
Research Campus, HHMI)”. Laser wavelength was tuned to 950 nm (laser power 
after the objective: 25-50 mW) for two-photon excitation of tdTomato and Alexa 
Fluor 488. 

An Axon Multiclamp 700B amplifier was used for extracellular recording of 
spikes. A patch pipette containing HEPES-ACSF and 10-20 tM Alexa Fluor 488 
hydrazide (Life Technologies) was advanced along its axis towards neurons located 
between 150 and 250 pm below the dura with minimal lateral movements. A small 
positive pressure was applied to the patch pipette to avoid clogging of the tip and 
to inject a small amount of fluorescent dye to stain the extracellular space. Non- 
fluorescent neurons were visualized as negative images**. The pipette resistance 
was constantly monitored in voltage-clamp mode. The concurrence of the pipette 
tip contacting the neuron and an increase in pipette resistance indicated successful 
targeting, which was further confirmed post hoc (see below). Upon the release of 
positive pressure, a small negative pressure was applied to form a loose seal (10- 
30 MQ). The amplifier was then switched to the current-clamp mode with zero 
current injection to record voltage. Data were low-pass filtered at 10 kHz and acquired 
at 32 kHz with a NI-DAQ board (NI PCle-6259, National Instruments) under the 
control of a custom-written program running in Matlab (Mathworks). Within a 
local region (<50 jum), neighbouring tdTomato* and tdTomato™ neurons were 
sequentially targeted for recording, but the order of recordings from tdTomato* 
and tdTomato” was alternated. The correct targeting of tdTomato* neurons was 
confirmed at the end of the recording either by the filling of the neuron with the 
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fluorescent dye contained in the pipette via break-in or by the presence of neuronal 
fluorescence in the recording pipette due to the negative pressure. 

Visual stimuli were generated in Matlab with Psychophysics Toolbox” and 
displayed on a gamma-corrected liquid-crystal display monitor (30 cm X 47.5 cm, 
60 Hz fresh rate, mean luminance 50 cd m *). The monitor was placed 25 cm away 
from the contralateral eye, covering 62° (vertical) < 87° (horizontal) of the visual 
space. The monitor was approximately centred at the retinotopic location corres- 
ponding to the V1 recording site by monitoring single-unit or multi-unit activity in 
response to a moving bar on the screen. During recordings, full-field sinusoidal 
drifting gratings (temporal frequency 2 Hz, spatial frequency 0.04 cycles per degree, 
100% contrast) were presented randomly at 12 different directions from 0° to 330° 
for 1.5 s, preceded and followed by the presentation ofa grey screen for 2 sand 1.5s, 
respectively. The complete set of stimuli was repeated 8-16 times. 

We analysed data offline using a custom-written program in Matlab. Voltage 

signals were high-pass filtered (125 Hz). Spikes were first detected as events exceed- 
ing five times the standard deviation of the noise, and then visually verified. 
Spontaneous spike rate was calculated as the average spike rate during the 2-s time 
window before the presentation of a visual stimulus. Evoked spike rate was calcu- 
lated as the average spike rate during the 1.5-s time window of visual stimulation. 
Overall spike rate was calculated as the average spike rate during the entire record- 
ing period. 
Statistics. All reported sample numbers (1) represent biological replicates. Sample 
sizes were estimated to have 70-80% power to detect expected effect size using 
StatMate 2 (GraphPad Software). Statistical analyses used Prism 5 (GraphPad Soft- 
ware) and Matlab. Linear regression with an F-test (two-sided) was used for Fig. le 
and Extended Data Fig. 2a, b. Bootstrapped distributions (Extended Data Fig. 2c) 
were used to determine the statistical significance for Fig. 1f. A Wilcoxon matched- 
pairs signed rank test (two-sided) was used for Figs 2, 3h-j, 4, 5b and Extended Data 
Figs 1-5. A Mann-Whitney U-test (two-sided) was used for Fig. 3d—f, uIPSC ampli- 
tudes in Figs 5f, g and Extended Data Figs 4, 6, 9c, d. Fisher’s exact test (two-sided) 
was used for connectivity rates in Fig. 5f and Extended Data Fig. 9b. 
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Extended Data Figure 1 | Cre recombinase-expressing cells in the cortex of 
Scnn1a-Cre-Tg3 mice are layer 4 excitatory neurons. AAV-CAGGS-Flex- 
ChR2-tdTomato, expressing ChR2-tdTomato fusion protein in a Cre- 
dependent manner, was injected into Scnn1a-Cre-Tg3 mice. a, Representative 
fluorescent images of a coronal section of V1 showing that the ChR2- 
tdTomato-expressing cells located primarily in layer 4 (n = 11 mice). Cortical 
layers are indicated on the right based on the DAPI staining pattern. L, layer; 
WM, white matter. b, Left, schematic of experiments. Right, a layer 2/3 
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pyramidal cell was voltage clamped at the reversal potential for excitation 
(+10 mV). Photoactivation of ChR2-expressing neurons in layer 4 elicited an 
IPSC (black trace), which was abolished by the glutamatergic receptor 
antagonists NBQX and CPP (red trace), indicating its disynaptic nature. 

c, Summary data: NBQX and CPP reduced IPSC amplitudes by 98.0 + 0.6% 
(mean + s.e.m., 1 = 8, P = 0.008) indicating that ChR2 was exclusively 
expressed in excitatory neurons. 
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Extended Data Figure 2 | Characterization of the inter-cell variability of 
EPSCs, IPSCs and E/I ratios. a, b, The inter-cell variability of EPSCs, IPSCs 
and E/I ratios among neighbouring pyramidal cells does not correlate with their 
inter-soma distances. a, The average relative deviations of EPSCs, IPSCs and 
E/I ratios from each experiment in Fig. 1f are plotted against the average inter- 
soma distance from the same experiment. The average inter-soma distance is 
the mean of the distances between each pair of pyramidal cells. For the 
experiments in which only two pyramidal cells were recorded, the inter-soma 
distance between the two pyramidal cells was used. Lines, linear regression fits. 
b, The absolute value of the logarithm of the ratio of EPSCs (or IPSCs or E/I 
ratios) simultaneously recorded in two pyramidal cells was plotted against the 
inter-somatic distance between the two cells. c, The distribution of E/I ratios 
across pyramidal cells varies less than if EPSCs and IPSCs were randomly 
paired between cells and less than the distributions of EPSC and IPSC 
amplitudes. To determine whether the precise E/I ratio recorded within each 
pyramidal cell minimizes the average relative deviation, we computed the E/I 
ratios from randomly but uniquely paired EPSCs and IPSCs within each of the 
20 experiments from Fig. 1f. By randomizing within each experiment, we 
ensured that the average relative deviation was only modified owing to the 
pairing of EPSCs to IPSCs. Note that, for an experiment with N pyramidal cells, 
there were N! possible randomized pairings of EPSCs and IPSCs, and hence N! 
possible E/I ratio average relative deviations (referred to as random-E/I ratio 


average relative deviations). The distribution of the means of the random-E/I 
ratio average relative deviations (grey histogram) was constructed from the 
means of 10,000 samples. Each sample consisted of 20 random-E/I ratio 
average relative deviations, each of which was randomly chosen from the N! 
possible random-E/I ratio average relative deviations of each experiment. 
The grey vertical line represents the mean of the distribution. The distribution 
of the means of the E/I ratio average relative deviations (black histogram) 
was generated by bootstrapping (that is, resampling 10,000 times with 
replacement). Each resample consisted of 20 randomly chosen E/I ratio average 
relative deviations from the 20 experiments in Fig. 1f, and an E/I ratio average 
relative deviation was allowed to be repeated within one resample (that is, 
sampling with replacement). The black vertical line represents the mean of the 
experimentally obtained E/I ratio average relative deviations. The E/I ratio 
average relative deviations are smaller than the random-E/I ratio average 
relative deviations (P < 0.0001). The distributions of the means of the EPSC 
average relative deviations (red histogram) and the means of the IPSC average 
relative deviations (blue histogram) were generated by similar bootstrapping to 
the E/I ratio average relative deviations. The red and blue vertical lines 
represent the means of the experimentally obtained EPSC average relative 
deviations and IPSC average relative deviations, respectively. The E/I ratio 
average relative deviations are smaller than the EPSC average relative 
deviations (P < 0.0001) and the IPSC average relative deviations (P < 0.0001). 
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Extended Data Figure 3 | Most layer 2/3 Fos-EGFP+ neurons in V1 are 
pyramidal cells. Fos-EGFP mice were crossed with Gad2-ires-Cre and 
Rosa-CAG-LSL-tdTomato— WPRE mice to generate Fos-EGFP, Gad2-ires-Cre, 
Rosa-CAG-LSL-tdTomato— WPRE mice. a, Representative fluorescent images 
showed a coronal section of V1. All neurons were visualized by NeuroTrace 
435/455 blue fluorescent Nissl stain and GABAergic interneurons were labelled 
by tdTomato. EGFP was stained with an antibody against GFP and visualized 
with a secondary antibody conjugated with Alexa Fluor 647. Cortical layers are 
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indicated on the left based on the Nissl staining pattern. b, Enlarged view of the 
boxed region in a. In layer 2/3 of V1, only 5.3 + 0.9% (mean + s.e.m., n = 10 
sections from two mice) of EGFP+ neurons were GABAergic interneurons 
(two examples are indicated by arrowheads). GABAergic interneurons 
constitute 13.2 + 0.6% (mean = s.e.m., n = 14 sections from three mice 
including one Gad2-ires-Cre, Rosa-CAG-LSL-tdTomato— WPRE mouse) of all 
layer 2/3 neurons. 
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Extended Data Figure 4 | Overexpression of Kir2.1 increases a 

Ba** -sensitive Kt current and decreases neuronal excitability. a, Schematics 
of experiments. Kir2.1 or a non-conducting mutant Kir2.1 (Kir2.1Mut) 

was overexpressed in a subset of layer 2/3 pyramidal cells by in utero 
electroporation. b, Membrane currents in response to a 5 s membrane potential 
ramp from —25 to —125 mV from an untransfected control pyramidal cell, a 
pyramidal cell overexpressing Kir2.1 and a pyramidal cell overexpressing 
Kir2.1Mut. The purple traces were recorded in control condition and the grey 
traces were recorded in the presence of 50 1M BaCl), a concentration that 
primarily blocks the K* channels of the Kir2 subfamily®. The blue traces were 
obtained by subtracting the grey traces from the purple traces, representing 
the Ba” -blocked K* currents. c, The exogenously overexpressed Kir2.1 
increased not only the Ba”*-blocked inward current density at —125 mV 

(P = 0.01), but also the outward current density at —45 mV (P = 0.001) owing 


to its reduced inward rectification (see Methods). d, Kir2.1Mut can bind to the 
endogenous Kir2.1 to form non-conducting channels”, acting as a dominant 
negative to decrease the inward current density at —125mV (P = 0.004) 

but without affecting the outward current density at —45 mV (P = 0.2). 

e, Membrane potentials (upper panels) in response to current injections 
(lower panels) from an untransfected control pyramidal cell, a pyramidal cell 
overexpressing Kir2.1 and a pyramidal cell overexpressing Kir2.1Mut. 

f-h, Overexpression of Kir2.1 hyperpolarized the resting membrane potential 
(f, P = 0.0003), decreased the resting input resistance (g, P< 0.0001) and 
increased the rheobase current (h, P < 0.0001). i-k, Overexpression of 
Kir2.1Mut increased the resting input resistance (j, P = 0.0002), but had no 
effects on the resting membrane potential (i, P = 0.5) and the rheobase current 
(k, P = 0.9). The numbers of recorded neurons are indicated on the bars. All 
data are expressed as mean = s.e.m. 
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Extended Data Figure 5 | Overexpression of Kir2.1Mut or mNaChBacMut 
in layer 2/3 pyramidal cells does not affect inhibition. a, Left, schematic of 
experiments. Scnn1la-Cre-Tg3 mice with ChR2 in layer 4 excitatory neurons 
and Kir2.1Mut in a subset of layer 2/3 pyramidal cells. Right, monosynaptic 
EPSCs and disynaptic IPSCs from simultaneously recorded control and 
Kir2.1Mut neurons in response to layer 4 photoactivation. b-d, Summary 
graphs. b, Left, EPSC amplitudes in Kir2.1Mut neurons plotted against those in 
control neurons. Right, logarithm of the ratio between EPSC amplitudes in 
Kir2.1Mut and control neurons. Red, mean + s.e.m. EPSC amplitudes are 
similar between Kir2.1Mut and control neurons (n = 23, P = 0.7). c, As in 

b, but for IPSCs. IPSC amplitudes are similar between Kir2.1Mut and control 
neurons (n = 22, P= 0.6). d, As in b, but for E/I ratios. E/I ratios are similar 
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between Kir2.1 and control neurons (n = 22, P = 0.6). e, Left, schematic of 
experiments. Pvalb-ires-Cre mice with ChR2 in Pvalb cells and Kir2.1Mut in a 
subset of layer 2/3 pyramidal cells. Right, IPSCs from simultaneously recorded 
control and Kir2.1Mut neurons in response to Pvalb cell photoactivation. 

f, Summary graphs. Left, IPSC amplitudes in Kir2.1Mut neurons plotted 
against those in control neurons. Right, logarithm of the ratio between IPSC 
amplitudes in Kir2.1Mut and control neurons. Red, mean + s.e.m. IPSC 
amplitudes are similar between Kir2.1Mut and control neurons (n = 14, 
P=0.8). g, h, As in e, f, but for a non-conducting mutant mNaChBac 
(mNaChBacMut). IPSC amplitudes are similar between mNaChBacMut and 
control neurons (n = 16, P = 0.9). 
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Extended Data Figure 6 | Overexpression of mNaChBac increases neuronal 
excitability. a, Schematics of experiments. mNaChBac or a non-conducting 
mutant mNaChBac (mNaChBacMut) was overexpressed in a subset of layer 
2/3 pyramidal cells by in utero electroporation. b, Membrane currents (upper 
and middle panels) in response to voltage steps (lower panels) from an 
untransfected control pyramidal cell, a pyramidal cell overexpressing 
mNaChBac and a pyramidal cell overexpressing mNaChBacMut. The 
endogenous voltage-gated inward Na“ current was fast inactivating and was 
blocked by tetrodotoxin (TTX, 1 1M), whereas the mNaChBac-mediated 
inward current was slow inactivating and insensitive to TTX. Inset, overlay of 
the two dashed boxes. Note that the fast component of the inward current 
representing the endogenous Na“ current was blocked by TTX. c, Membrane 


mNaChBac 


Se 
i _- 


200 ms 


mNaChBacMut 


Action potential threshold (mV) ™ 
MN ® wo KR 
oi oO oa oO oa 
Rheobase current (pA) 
rs oo SS 
oO oO oO oO 
° 
y 
(al 


O ~& oO 512% 
& SL 
ei Se PPS 
es es 


potentials (upper panels) in response to current injections (lower panels) 
from a control neuron, a mNaChBac neuron and a mNaChBacMut neuron. 
The mNaChBac neuron generated long-lasting action potentials and 
depolarizations, whereas the mNaChBacMut neuron generated action 
potentials similar to the control neuron. d, e, Overexpression of mNaChBac 
lowered the action potential threshold (defined as the membrane potential 
whose derivative reaches 2 Vs) (d, P = 0.004) and decreased the rheobase 
current (e, P = 0.03). f, g, Overexpression of mNaChBacMut did not alter the 
action potential threshold (f, P = 0.9) and the rheobase current (g, P = 0.8). 
The numbers of recorded neurons are indicated on the bars. All data are 
expressed as mean + s.e.m. 
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Extended Data Figure 7 | Postnatal expression of mNaChBac and Kir2.1 
using Flpo and F-FLEX switch. a, Constitutive overexpression of mNaChBac 
causes a neuronal migration defect. mNaChBac or mNaChBacMut was 
overexpressed in a subset of pyramidal cells by in utero electroporation of 
pCAG-mNaChBac-T2A-tdTomato or pCAG-mNaChBacMut-T2A-tdTomato, 
respectively, on embryonic day 15.5 (E15.5). Representative fluorescent images 
of coronal sections of V1 obtained at postnatal day 16 or 17 showing that 
mNaChBac-expressing neurons (left panels) resided not only in layer 2/3, but 
also in layers 4-6 (n = 7 mice), whereas mNaChBacMut-expressing neurons 
(right panels) are all located in layer 2/3 (n = 5 mice). Cortical layers are 
indicated on the right based on the DAPI staining pattern. b, Experimental 
procedures for conditional expression of mNaChBac or Kir2.1 in a subset of 
layer 2/3 pyramidal cells. Left, plasmids pAA V-EF10-F-FLEX-mNaChBac- 
T2A-tdTomato or pAAV-EF1a-F-FLEX-Kir2.1-T2A-tdTomato together with 
pCAG-EGFP were electroporated in utero into V1 on embryonic day 15.5. 
Successful transfection is indicated by the expression of EGFP. Middle, AAV- 
hSynapsin-Flpo was injected postnatally into V1. Right, only those neurons that 
were transfected with either pAAV-EF1a-F-FLEX-mNaChBac-T2A-tdTomato 


or pAAV-EF1a-F-FLEX-Kir2.1-T2A-tdTomato and infected with AAV- 
hSynapsin-Flpo expressed mNaChBac-T2A-tdTomato or Kir2.1-T2A- 
tdTomato, respectively. c, Representative fluorescent images of coronal 
sections of V1 obtained at postnatal day 16 showing that without injection 
of AAV-hSynapsin-Flpo transfected neurons did not express mNaChBac- 
T2A-tdTomato (left panels, n = 2 mice). The expression of mNaChBac- 
T2A-tdTomato in transfected neurons was turned on by injection of 
AAV-hSynapsin-Flpo. These neurons were all properly located in layer 2/3 
(right panels, n = 7 mice). Cortical layers are indicated on the right based 

on the DAPI staining pattern. d, Schematics of concurrent expression of 
mNaChBac or Kir2.1 in layer 2/3 pyramidal cells and ChR2 in Pvalb or Sst cells. 
Plasmids pAAV-EF1a-F-FLEX-mNaChBac-T2A-tdTomato or pAAV-EF1a- 
F-FLEX-Kir2.1-T2A-tdTomato were electroporated in utero together with 
pCAG-EGFP into V1 of Pvalb-ires-Cre or Sst-ires-Cre mice on embryonic day 
15.5. AAV-EF1a-DIO-hChR2(H134R)-EYFP and AAV-hSynapsin-Flpo 
were injected postnatally into V1. ChR2 was conditionally expressed in Pvalb 
or Sst cells, whereas mNaChBac or Kir2.1 was conditionally expressed in a 
subset of layer 2/3 pyramidal cells. 
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Extended Data Figure 8 | A Flpo recombinase-mediated FLEX (F-FLEX) 
switch for conditional gene expression. a, DNA sequence of the F-FLEX 
switch cassette. The first F14 site and Frt site were constructed in the forward 
direction and were separated by a 50-base-pair linker. The second F14 site and 
Frt site were constructed in the reverse direction and were separated by another 
50-base-pair linker. Multiple cloning sites were inserted between the first Frt 
site and the second F14 site. b, Principle of F-FLEX switch. The gene of interest 
is inserted between the first Frt site and the second F14 site of the F-FLEX switch 
cassette in an inverted orientation, and is driven by an EFlo promoter. Flpo- 
recombinase-mediated recombination first occurs between the two F14 sites or 
the two Frt sites that are in the opposite direction, leading to a reversible 
inversion of the inverted gene of interest. Flpo-mediated recombination then 
occurs between the two F14 sites or the two Frt sites that are now in the same 
direction, excising the Frt site or the F14 site between them, respectively. The 
resulting construct contains only one F14 site and one Frt site, and the gene of 


200 um 


interest is permanently locked in the forward orientation. c, Flpo turns on 
F-FLEX switch. HEK cells were transfected with (1) Flpo, (2) F-FLEX- 
mNaChBac-T2A-tdTomato, (3) Flpo and F-FLEX-mNaChBac-T2A- 
tdTomato or (4) Cre and F-FLEX-mNaChBac-T2A-tdTomato. EGFP was 
co-transfected to monitor the transfection. There was no leaky expression 

of mNaChBac-T2A-tdTomato in the absence of Flpo. mNaChBac-T2A- 
tdTomato expression was switched on by the expression of Flpo, but not by Cre. 
Similar results were obtained with other F-FLEX constructs (n = 5). d, Flpo 
does not turn on Cre-dependent DIO switch*®. HEK cells were transfected with 
(1) Cre, (2) DIO-hChR2(H134R)-EYEP, (3) Cre and DIO-hChR2(H134R)- 
EYFP or (4) Flpo and DIO-hChR2(H134R)-EYFP. mRFP was co-transfected to 
monitor the transfection. There was no leaky expression of hChR2(H134R)- 
EYFP in the absence of Cre. hChR2(H134R)-EYFP expression was switched on 
by the expression of Cre, but not by Flpo. Similar results were obtained with 
other DIO constructs (n = 2). 
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Extended Data Figure 9 | Overexpression of Kir2.1 in a small subset of layer 
2/3 pyramidal cells does not affect Pvalb-cell-mediated inhibition onto 
untransfected pyramidal cells. a, Schematic of experiments. Unitary 
connection from a Pvalb cell onto nearby layer 2/3 pyramidal cells in control 
mice (left) and onto untransfected pyramidal cells in mice that were 
electroporated in utero with pCAG-Kir2.1-T2A-tdTomato (right). 

b, Connectivity rates from Pvalb cells to layer 2/3 pyramidal cells in control 


Qa 


Average relative deviation 


: 


Control Untransfected 


mice (95%, 57 out of 60) and to untransfected pyramidal cells in electroporated 
mice (93%, 52 out of 56) are similar (P = 0.7). c, Cumulative frequencies for 
ulIPSC amplitudes (control: n = 57, median, 224.0 pA; untransfected: n = 52, 
median, 190.4 pA; P = 0.5). Inset, mean + s.e.m. d, Summary graph for the 
average relative deviations of ulPSCs from 20 and 17 similar experiments as in 
a. Bars, mean + s.e.m. (P = 0.6). 
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Extended Data Figure 10 | A model for inter-cell variability of Pvalb-cell- 
mediated inhibition. Schematic illustration of how pyramidal cell activity 
regulates the inter-cell variability of Pvalb-cell-mediated inhibition. Left, 
pyramidal cells with different activity levels (dark and light colours indicate 
high and low activity, respectively) receive different amounts of Pvalb-cell- 
mediated inhibition (long and short bars indicate more or less inhibition, 
respectively). Inhibition consists of an activity-dependent component (green 
bars) and an activity-independent component (blue bars). The activity- 
dependent component is positively regulated by the pyramidal cell activity and 
varies accordingly, whereas the activity-independent component is similar 
across neurons. Right, when the activity of pyramidal cells is suppressed by 
overexpression of Kir2.1, the activity-dependent component is diminished and 
the remaining inhibition is largely the activity-independent component. This 
flooring effect reduces the variability of ulIPSC amplitudes among Kir2.1- 
expressing neurons. 
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Type I interferon responses in rhesus macaques 
prevent SIV infection and slow disease progression 


Netanya G. Sandler'+, Steven E. Bosinger”’, Jacob D. Estes*, Richard T. R. Zhu’, Gregory K. Tharp”, Eli Boritz', Doron Levin®, 
Sathi Wijeyesinghe’, Krystelle Nganou Makamdop', Gregory Q. del Prete*, Brenna J. Hill’, J. Katherina Timmer’, Emma Reiss’, 
Ganit Yarden®, Samuel Darko!, Eduardo Contijoch!, John Paul Todd®, Guido Silvestri’, Martha Nason’, Robert B. Norgren Jr®, 
Brandon F. Keele’, Srinivas Rao®, Jerome A. Langer’, Jeffrey D. Lifson*, Gideon Schreiber? & Daniel C. Douek! 


Inflammation in HIV infection is predictive of non-AIDS morbidity 
and death’, higher set point plasma virus load? and virus acquisition’; 
thus, therapeutic agents are in development to reduce its causes and 
consequences. However, inflammation may simultaneously confer 
both detrimental and beneficial effects. This dichotomy is particularly 
applicable to type I interferons (IFN-I) which, while contributing to 
innate control of infection* "°, also provide target cells for the virus 
during acute infection, impair CD4 T-cell recovery, and are associated 
with disease progression®”"*’. Here we manipulated IFN-I signalling 
in rhesus macaques (Macaca mulatta) during simian immunodefi- 
ciency virus (SIV) transmission and acute infection with two comple- 
mentary in vivo interventions. We show that blockade of the IFN-I 
receptor caused reduced antiviral gene expression, increased SIV res- 
ervoir size and accelerated CD4 T-cell depletion with progression to 
AIDS despite decreased T-cell activation. In contrast, IFN-a2a admin- 
istration initially upregulated expression of antiviral genes and pre- 
vented systemic infection. However, continued IFN-a2a treatment 
induced IFN-I desensitization and decreased antiviral gene expression, 
enabling infection with increased SIV reservoir size and accelerated 
CD4 T-cell loss. Thus, the timing of IFN-induced innate responses in 
acute SIV infection profoundly affects overall disease course and out- 
weighs the detrimental consequences of increased immune activation. 
Yet, the clinical consequences of manipulation of IFN signalling are 
difficult to predict in vivo and therapeutic interventions in human 
studies should be approached with caution. 

We designed and produced an IFN-I receptor antagonist (IFN- lant) 
that blocks IFN-«2 antiviral and antiproliferative activity in vitro”. Six 
rhesus macaques received 1 mg of IFN- lant daily for 4 weeks follow- 
ing intrarectal challenge with SIV),4c251 (dosage based on previous dose- 
response studies; Extended Data Fig. la—d); nine macaques received 
saline (Extended Data Fig. le). Initial assessment of in vivo effects revealed 
delayed peak mRNA expression of MX1 and OAS2 in the IFN- lant maca- 
ques (Extended Data Fig. 2a, b), but peak expression levels did not differ 
between cohorts. Whole-transcriptome sequencing revealed that express- 
ion of most interferon-stimulated genes (ISGs) in peripheral blood mono- 
nuclear cells (PBMCs) was significantly decreased at 7 days post-infection 
(d.p.i.) in the IFN-lant-treated compared to placebo-treated macaques 
(Fig. 1a), including the antiviral genes APOBEC3G and MX2, those that 
code for cyclic GMP-AMP synthase (cGAS) and tetherin**”®, and IRF7, 
a master IFN-I signalling inducer”, indicating profound disruption of 
IFN-I signalling (Fig. 1b). Most ISGs in the IFN- lant group normalized 
at 10 and 21 d.p.i. and were upregulated at 28 and 84 d.p.i. (Extended Data 
Fig. 2c). Consistent with transcriptional data (Extended Data Fig. 2d, e), 


APOBEC3G, TRIM5a and MX2 protein expression by quantitative immu- 
nohistochemistry was significantly attenuated in lymph nodes at 4 weeks 
post-infection (w.p.i.) compared to placebo (Fig. 1c). Thus, IFN-lant 
treatment during acute SIV infection resulted in delayed and decreased 
antiviral gene and protein expression in peripheral blood and lymph nodes. 

Consistent with reduced antiviral gene expression, IFN- lant macaques 
had significantly higher plasma viral loads (pVLs) than placebo macaques 
during acute infection (Fig. 2a) and after 20 w.p.i. despite similar numbers 
of transmitted/founder viruses (measured 10 d.p.i., Extended Data Fig. 8a). 
Delayed peak ISG expression, however, was predictive ofhigher pVLs at 
peak and 12 w.p.i. and higher PBMC-associated SIV gag DNA levels at 
28 d.p.i. (Extended Data Fig. 2f-h). Additionally, the number of lymph node 
SIV RNA‘ cells per mm” as determined by in situ hybridization was sig- 
nificantly higher in macaques treated with IFN- lant compared to placebo 
during chronic infection (Fig. 2b). Thus, early IFN-I signalling was critical 
for early and long-term control of SIV replication and virus reservoir size. 

Although both groups experienced a similar, significant decrease in 
circulating CD4 T-cell frequency (Fig. 2c) and CD4/CD8 T-cell ratio 
(Extended Data Fig. 3a) between 0 and 12 w.p.i., IFN-1ant macaques experi- 
enced a profound decline with a lower lymph node CD4 T-cell frequency 
and CD4/CD8 T-cell ratio beyond 12 w.p.i. (Fig. 2d and Extended Data 
Fig. 3b). The frequency of CCR5’ memory CD4T cells, potential targets 
for infection, was significantly lower in blood in IFN-lant-treated than 
placebo-treated rhesus macaques through 12 w.p.i. (Fig. 2e), and lymph 
nodes at 4 and > 12 w.p.i. (Fig. 2f), suggesting depletion due to infection. 
Circulating T-cell activation, reflected by HLA-DR* and Ki67* memory 
CD4 and CD8 T-cell frequencies, was not significantly different between 
groups at 4 or >12 w.p.i. (Supplementary Information). However, HLA- 
DR* and Ki67* memory CD4 and CD8 T-cell frequencies were signifi- 
cantly lower in the lymph nodes of IFN- lant macaques than placebo at 
>12w.p.i. (Extended Data Fig. 3c-f). Taken together, IFN-I signalling 
blockade during acute SIV infection resulted in attenuated T-cell activa- 
tion in lymphoid tissue yet accelerated CD4 T-cell depletion. 

Clinical outcome ultimately gives the most comprehensive measure 
of disease state. Consistent with a median life expectancy of 1 year”, the six 
placebo-treated macaques followed through 44 w.p.i. (three were trans- 
ferred to another study before 30 w.p.i.) lived, but the IFN- lant macaques 
began dying of AIDS at 24 w.p.i. and all were euthanized per protocol for 
signs of AIDS by 30 w.p.i. (Fig. 2g). Thus, blocking IFN-I signalling during 
only the first 4 weeks of infection resulted in accelerated disease progres- 
sion and death from AIDS. 

Exploration of the molecular mechanisms underlying the accelerated 
disease progression by whole-transcriptome sequencing revealed statistically 
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Figure 1 | IFN-1ant suppresses early antiviral responses. a, Expression of 
ISGs in macaques treated with IFN-lant (n = 6) or placebo saline (n = 9) 

7 days after SIV infection. FPKM (log-transformed fragments per kilobase of 
transcript per million fragments sequenced) reflects the relative abundance of 
transcripts. P values indicate differentially expressed genes at 7 d.p.i. 

b, Expression assessed by RNA sequencing (RNA-seq) of antiviral genes 
APOBEC3G, MX2 and those that code for cGAS and tetherin in PBMCs before 
and 7 days after SIV infection in macaques that received IFN- lant (Ant, 1 = 6) 
or placebo (Plac, n = 9) injections. Error bars indicate range. P values were 
calculated by Mann-Whitney U test. c, APOBEC3G, TRIM5a and MX2 


significant enrichment of pathways regulating innate immunity, IFN- 
I production and T- and B-lymphocyte activation (Extended Data Fig. 
4a-c) with significant downregulation of most genes in the IFN- lant 
group at 7 d.p.i. compared to placebo-treated controls (Fig. 1d and Extended 
Data Fig. 2c). Relative to placebo, the most significantly perturbed path- 
way in the IFN- lant-treated animals consisted of pathogen-associated 
pattern recognition receptor (PRR) signalling molecules (Fig. 1d and 
Extended Data Fig. 4a), with significant downregulation of several viral 
PRRs (TLR3, TLR7, DDX58/RIG-I, MDA5/IFIH1) and their downstream 
adaptors (TICAM1/TRIF) or transcription factors (IRF7) in IFN- lant 
macaques compared to placebo (Extended Data Fig. 4c). Concordantly, 
expression of the downstream mediators IL-6, TNF and IL-1[ was sig- 
nificantly reduced. 

Consistent with their responsiveness to IFN-I’’, the frequencies of total 
and cytotoxic CD16“ natural killer (NK) cells were significantly lower in 
the IFN- lant group than placebo at >12 w.p.i, although there were no difer- 
ences at 4 w.p.i. (Extended Data Fig. 3g—i). However, we observed no sig- 
nificant differences in phenotype, function or timing of CD4 or CD8 T-cell 
responses (Extended Data Fig. 5 and Supplementary Information). 

Collectively, these data suggest that IFN-I signalling early in SIV infec- 
tion is critical for innate immune control of virus replication and that its 
antagonism, even if only brief during the acute phase, results in decreased 
virus control, accelerated CD4 T-cell depletion and progression to AIDS. 

Given these findings, we hypothesized that administering IFN-I could 
improve SIV\acosi control despite evidence suggesting that inflammation 
exacerbates virus acquisition and disease progression’ *"*. Six macaques 
received 6 ugkg™ ' pegylated IFN-«2a dosed weekly, as determined in 
prior studies”, starting 1 week before challenge and continued through 
4 weeks after systemic infection (defined as detectable pVL 7 days post- 
challenge). Macaques were followed until 12 w.p.i. then euthanized per 
protocol (Extended Data Fig. le). Whereas all nine placebo macaques were 
infected after the first intrarectal inoculation, IFN-«2a treatment signifi- 
cantly delayed systemic infection, necessitating two, three or five challenges 
to infect these macaques (Fig. 3a), and significantly decreased the number 
of transmitted/founder variants (Extended Data Fig. 8a). The macaques 
that required more challenges had fewer transmitted/founder variants 
(Fig. 3b); however, the circulating viruses at peak viral load in both groups 
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d, Expression of genes involved in pattern recognition receptor signalling of 
IFN-lant-treated (n = 6) macaques compared to placebo (n = 9). P values 
represent the differential expression between IFN- lant and placebo macaques 
at 7 d.p.i. For all panels, IFN-lant-treated macaques are represented in red, 
placebo-treated macaques in blue. 


were equally susceptible to in vitro IFNa inhibition (data not shown). 
Thus, treatment with IFN-«2a during SIV challenge increased host res- 
istance to systemic infection. 

We assessed whether ISGs may have contributed to resistance to infec- 
tion. We found that MX1 and OAS2 gene expression detected by quant- 
itative reverse transcription PCR (qRT-PCR) increased after one IFN-a2a 
dose but decreased after repeated administration, suggesting an IFN- 
desensitized state (Extended Data Fig. 6a—d) resulting in no significant 
differences between treated and placebo groups on the day of infectious 
challenge. Furthermore, IFN-«2a macaques had lower ISG expression 
compared to placebo at 7 and 10 d.p.i., including those encoding cGAS, 
APOBEC3G, MX2 and tetherin (Fig. 3c-fand Extended Data Fig. 7a-b). 
At 4w.p.i, after 6, 7 or 9 doses of IFN-«2a, lymph node TRIM5z protein 
expression was significantly lower in the IFN-%2a group compared to 
placebo (Extended Data Fig. 8b). To explore the effects of IFN-02a admin- 
istration on ISG expression further, we performed whole-transcriptome 
sequencing on uninfected rhesus macaques administered pegylated IFN- 
«2a for 3 weeks. Seven days after one IFN-0:2a dose, expression of virus 
restriction factors including TRIM22, MX2 and IRF7 was increased in 
PBMCs, lymph nodes and rectum (Extended Data Fig. 7c-h). However, 
after 3 doses these ISGs returned to pre-treatment expression levels (MX2) 
or lower (TRIM22 and IRF7), consistent with the timing of infection of the 
SIV-challenged macaques. (Extended Data Fig. 7c-h). We found no in 
vitro IFN-neutralizing activity in the plasma at the time of infection to 
explain the loss of exogenous IFN-o2a activity (Extended Data Fig. 6e 
and Supplementary Information), suggesting that ISG downregulation 
probably occurred as a result of intrinsic regulatory mechanisms. Indeed, 
FOXO3a, a central regulator of IFN-I feedback”, was significantly upre- 
gulated in the IFN-«12a compared to placebo macaques at 7 d.p.i. (Fig. 3g 
and Extended Data Fig. 6g) and, concordantly, ISGs demonstrated to be 
repressibly bound by FOXO3a (ref. 25) had lower expression at 7 d.p.i. 
(Extended Data Fig. 6h). We also assessed a broad panel of genes predicted 
to be bound by FOXO3a (ref. 25) using gene-set enrichment analysis and 
observed a significantly lower cumulative ranking of FOXO3a targets in 
the 7 d.p.i. IFN-«2a-treated macaque transcriptome data when contrasted 
to the 7d.p.i. placebo data (Fig. 3h), suggesting increased FOXO3a- 
mediated repression in the IFN-«%2a macaques. Indeed, after 21 days of 
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Figure 2 | IFN-1ant accelerates disease progression in SIV-infected rhesus 
macaques. a, Plasma SIV RNA levels during acute and chronic SIV infection in 
macaques treated with IFN-lant (n = 6) or placebo saline (n = 9). *P < 0.05. 
Shading indicates treatment period. P value represents the comparison between 
groups of the areas under the curve (AUC) (0-4 w.p.i.). b, SIV RNA-containing 
cells in the lymph nodes by in situ hybridization at 4 and 12 w.p.i. in IFN-lant 
(Ant, n = 6) and placebo (Plac, n = 6) macaques. Horizontal bars represent 
median values. P value was calculated by Mann-Whitney U test. c, Frequency of 
CD4 T cells in peripheral blood during acute and chronic SIV infection in 
macaques treated with IFN-lant (n = 6) or placebo saline (n = 9). Error bars 
indicate range. Red vertical line indicates day 0 of systemic SIV infection. Shading 
indicates treatment period. P value represents the comparison between groups of 
the AUC (12-32 w.p.i.). d, Frequency of CD4 T cells in lymph nodes before SIV 
infection and at 4 or >12 w.p.i. for IFN- lant (Ant, n = 6) and placebo (Plac, 

n = 9) macaques. Horizontal bars represent median values. P values at different 
time points within treatment groups were calculated by Wilcoxon matched pairs 
signed rank test and between groups by Mann-Whitney U test. e, Frequency of 
CCR5* memory (CD28'CD95* or CD28 CD95*’") CD4 T cells in peripheral 
blood in macaques treated with IFN- lant (n = 6) or placebo saline (m = 9). Error 
bars indicate range. Red vertical line indicates day 0 of systemic SIV infection. 
Shading indicates treatment period. P values represent the comparison between 
groups of the AUC (0-12 w.p.i. and 4-12 w.p.i.). £, Frequency of CCR5* memory 
CD4T cells in lymph nodes in macaques treated with IFN- lant (n = 6) or placebo 
saline (n = 9). Horizontal bars represent median values. P values at different time 
points within treatment groups were calculated by Wilcoxon matched pairs 
signed rank test and between groups by Mann-Whitney U test. g, Kaplan-Meier 
survival curve comparing macaques treated with IFN-lant (n = 6) to macaques 
that received placebo (n = 9). P value indicates the significance by logrank 
(Mantel-Cox) test for survival by 32 w.p.i. For all panels, IFN-1ant-treated 
macaques are represented in red, placebo-treated macaques in blue. 


IFN-«2a treatment in the unchallenged macaques, increased FOXO3a 
expression was associated with ISG downregulation (Extended Data 
Fig. 6f). While these data do not exclude additional mechanisms, they 
suggest that ISG downregulation in the IFN-«2a-treated macaques was a 
consequence of endogenous homeostatic control rather than neutralization 
of exogenous IFN-o.2a. Thus, exogenous augmentation of IFN-I signalling 
was associated with enhanced protection against SIV acquisition, but sus- 
ceptibility to exacerbated systemic infection once ISG expression waned. 
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Figure 3 | IFN-a2a treatment transiently prevents systemic infection but 
results in an IFN-tolerant state. a, Kaplan-Meier survival curve comparing 
the number of SIVyjacos: rectal challenges required to achieve systemic 
infection in macaques treated with IFN-«2a (n = 6) or placebo saline (n = 9). 
Pvalue indicates the significance by logrank (Mantel-Cox) test of the number of 
challenges required for systemic infection, between 1 and 5 challenges. 

b, Correlation between the number of challenges needed to achieve systemic 
infection and the number of transmitted/founder (T/F) variants in IFN-o2a 
(green, n = 6), IFN-lant (red, n = 6) and placebo (blue, n = 9) macaques. 
Pvalue indicates the significance of the correlation between the number of 
challenges and the number of T/F variants in all groups. r indicates the 
Spearman’s rank correlation coefficient. c-f, Expression of antiviral mediators in 
PBMCs in IFN-o2a-treated (IFN, n = 6) macaques compared to placebo (Plac, 
n= 9)at 10 d.p.i. Error bars indicate range. P values represent the comparison of 
FPKMs between IFN-«2a and placebo at 10 d.p.i. by Mann-Whitney U test. 

g, Expression profile of FOXO3a, a negative regulator of type I IFN signalling. 
Pvalue represents the comparison of FOXO3a FPKM between IFN-«2a (n = 6) 
and placebo (n = 9) macaques at 7 d.p.i. h, Gene-set enrichment analysis in 
IFN-o.2a (n = 6) and placebo (n = 9) macaques of genes previously 
demonstrated to be overexpressed in FOXO3 '~ macrophages”. The line plot 
indicates the running-sum of the enrichment score; the leading edge is indicated 
in magenta. The relative positions of all genes within the ranked data set are 
shown in the stick plot below the x axis. P value indicates statistical significance of 
the enrichment score, reflecting lower cumulative ranking of FOXO3a targets in 
IFN-«2a-treated macaques compared to placebo at 7 d.p.i. For all panels, 
IFN-o2a-treated macaques are represented in green, placebo-treated macaques 
in blue. 


Given their roles in virus control, T and NK cells were evaluated. No 
changes in SIV-specific CD4 or CD8 T-cell responses developed with 
repeated challenges, suggesting that resistance to infection did not depend 
on adequate T-cell responses (Supplementary Information). However, 
higher circulating CD56 NK-cell frequencies after starting IFN-02a pre- 
dicted more challenges necessary for infection and were associated with 
resistance to infection (Extended Data Fig, 8c). Once the CD56" NK-cell 
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Figure 4 | IFN-a2a accelerates disease progression. a, PBMC-associated SIV 
gag DNA at 10, 14 and 28 d.p.i. in IFN-01.2a macaques (IFN, n = 6) and placebo 
(Plac, n = 6) macaques. LLQ indicates lower limit of quantification. Horizontal 
bars represent median values. P values were calculated by Mann-Whitney 

U test. b, Expression of differentially expressed genes involved in pattern 
recognition receptor signalling. P values represent the comparison between 
FPKMs of IFN-o2a (n = 6) and placebo (n = 9) macaques at 7 d.p.i. 

c, Frequency of CD4 T cells in peripheral blood during acute and early SIV 
infection in IFN-«2a (n = 6) and placebo (n = 9) macaques. Error bars 
indicate range. Shading indicates treatment period. Red vertical line indicates 
day 0 of systemic SIV infection. d, CD4/CD8 T-cell ratio in peripheral blood 
during acute and early SIV infection in IFN-«2a (n = 6) and placebo (n = 9) 


frequency declined, the macaques became infected. In rectal biopsies at 
28 d.p.i., more resistant macaques had higher CD16* NK-cell frequencies 
(Extended Data Fig. 8d). Together, these data suggest that IFN-02a-induced 
innate immunity, rather than T-cell responses, protected against SIV infection. 

Despite fewer transmitted/founder variants, pVLs did not differ sig- 
nificantly between IFN-«2a and placebo groups (Extended Data Fig. 8e), 
potentially obscured by the variability and small number of macaques. PBMC- 
associated SIV gag DNA levels, however, were significantly higher in the 
IFN-«2a group than placebo at 10, 14 and 28 d.p.i. (Fig. 4a). While the 
circulating CD4 T-cell frequency (Fig. 4c) and CD4/CD8 T-cell ratio 
(Fig. 4d) declined between 0 and 4 w.p.i. in both groups, the CD4/CD8 
T-cell ratio was significantly lower in the IFN-0.2a group (based on area 
under the curve (AUC) (0-12 w.p.i.)). The CCR5* memory CD4 T-cell 
frequency in blood was significantly lower in IFN-«2a than placebo 
macaques during acute (AUC(0-4 w.p.i.), Fig. 4e) and chronic infec- 
tion (AUC(0-12 w.p.i.)) and in lymph nodes at 4 (Fig. 4f) and 12 w.p.i., 
although the frequency of CD4 T cells and CD4/CD8 T-cell ratio in lymph 
nodes and jejunum were similar between groups at 4 and =12 w.p.i. 
(data not shown). Thus, SIV-infected IFN-«2a-treated macaques had 
increased CD4 T-cell-associated virus load and greater CD4 T-cell loss 
with preferential depletion of the CCR5~ subset. 

We assessed whether increased immune activation was associated 
with this CD4 T-cell loss. The circulating Ki67* memory CD4 and CD8 
T-cell frequencies were lower in the IFN-«2a compared to placebo group 
during acute infection only (AUC(0-4 w.p.i.)) with no differences in the 
frequencies of HLA-DR* memory T cells (Extended Data Fig. 9a—d). In 
the lymph nodes, the frequencies of Ki67* memory CD4 T cells at 4 w.p.i. 
and HLA-DR* memory CD4T cells at =12 w.p.i. were significantly lower 
in the IFN-«2a than placebo group with no differences in CD8 T cells 
(Extended Data Fig. 9e-h). Thus, the IFN-«2a macaques had similar 
or less immune activation compared to placebo macaques. 

We further explored the mechanisms underlying the increased cell- 
associated SIV and CD4 T-cell depletion in IFN-01.2a macaques using 
whole-transcriptome sequencing of PBMCs. As with IFN- lant, IFN- 
a2a administration significantly affected the PRR signalling pathway 
(Extended Data Fig. 10a). C1q, TLR3, TLR7 and RIG-I were downregu- 
lated in the IFN-a2a macaques, yet expression of IL-6, TNF and IL-1 
was increased at 7 d.p.i. (Fig. 4b and Extended Data Fig. 10b). Mediators 
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Days post-infection 


Pre-infection 4 w.p.i. 212 w.p.i. 


macaques. Error bars indicate range. Shading indicates treatment period. Red 
vertical line indicates day 0 of systemic SIV infection. P value represents the 
comparison between groups of the AUC (0-12 w.p.i.). e, Frequency of CCR5* 
memory CD4 T cells in peripheral blood in IFN-«2a (n = 6) and placebo 

(n = 9) macaques. Error bars indicate range. Shading indicates treatment 
period. Red vertical line indicates day 0 of systemic SIV infection. P values 
represent the comparison between groups of the AUC (0-4 and 0-12 w.p.i.). 
f, Frequency of CCR5* memory CD4 T cells in lymph nodes in IFN-o.2a 

(n = 6) and placebo (n = 9) macaques. Horizontal bars represent median 
values. P values were calculated by Mann-Whitney U test. For all panels, 
IFN-«2a-treated macaques are represented in green, placebo-treated macaques 
in blue. 


of stress responses and cell survival downstream of IL-6 signalling, such 
as MAP2K3, and other anti-apoptotic genes such as DIABLO and BCL2L1 
were upregulated at 7 d.p.i. (Extended Data Fig. 10b, 10c), whereas 
pro-apoptotic genes including CASP10 were downregulated. Thus, early 
IFN-o2a administration increased early expression of proinflammatory 
cytokines despite decreased PRR expression and delayed induction of 
apoptotic pathways. 

We next assessed T- and NK-cell-mediated immunity. IFN-o2a-treated 
macaques had intact and even enhanced SIV-specific CD8 T-cell responses 
with no deficits in CD4 T-cell responses (Extended Data Fig. 8f-i and 
Supplementary Information). Whereas there were no differences at 4 w.p.i., 
between 4 and 12 w.p.i., the CD56* NK-cell frequency increased and 
CD16" NK-cell frequency decreased in the IFN-2a but not the placebo 
group. The frequencies of CD16* (Extended Data Fig. 8j), CD107a* and 
granzyme B* NK cells at 12 w.p.i. were subsequently lower in the IFN- 
o2a-treated macaques compared to placebo. 

Taken together, these data show that despite initially increasing ISG 
expression and conferring resistance to SIV infection, continued IFN-a2a 
treatment resulted in an IFN-desensitized state, with decreased antiviral gene 
expression, increased susceptibility to infection, increased cell-associated 
virus load and greater CD4 T-cell depletion compared to placebo. 

Thus, both IFN-I receptor blockade and IFN-o2a administration 
ultimately resulted in decreased and delayed IFN-I responses, the bio- 
logical outcomes of which reveal a pivotal role for IFN-I signalling in 
acute retroviral exposure in primates that overshadows the potential 
harm of increased inflammation. That a delay of as few as 3 days in 
antiviral gene expression resulted in accelerated disease progression 
suggests that SIV disease course is determined very early and depends 
upon the precise timing of peak antiviral activity. The observation of 
shortened time-to-death despite the eventual normalization of ISG 
expression during chronic infection clearly shows that a resurgence 
of late antiviral activity cannot compensate for compromised early virus 
control. Indeed, administration of IFN-« in chronic HIV infection has 
given inconsistent results for virus load and CD4 T-cell counts with no 
effect on disease outcome’ ’. An interesting parallel to our study was 
recently described in mice with chronic lymphocytic choriomeningitis 
virus infection where persistent IFN-I signalling exerts antiviral effects 
but also leads to immune hyperactivation and suppression of antiviral 
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T-cell responses'*”*. Our findings differ from a previous study that reported 
unchanged virus burden with IFN-«2a administration during chronic 
SIV infection™* and a study of IFN-«2b or the IFN-«B/D chimaera in 
rhesus macaques challenged intravenously with highly pathogenic SIV 
Deltage7o that showed no protection from infection but decreased peak 
antigenaemia”®. The initial protection from infection and decrease in 
transmitted/founder variants observed in our study highlight the rectal 
mucosa’s function as a barrier to SIV during transmission in the con- 
text of ISG induction. It is tempting to speculate that administration of 
IFN-«2a before SIV challenge also facilitated NK-cell activation and 
recruitment to the rectum and contributed to protection from infection”. 
Given the relative IFN-I resistance of transmitted/founder viruses*”° and 
the induction of an IFN-desensitized state associated with the upregu- 
lation of the IFN-I pathway repressor FOXO3a (ref. 25), our findings 
add a cautionary note to adjuvanted HIV vaccines or other prevention 
approaches that induce ISGs at mucosal surfaces. Furthermore, while 
the rectum contains many resident target CD4 T cells*° that antiviral 
mediators can protect, sites with few resident target cells, such as the 
female genital tract’®, may depend on IFN-I signalling for CD4 T-cell 
recruitment and virus propagation. Thus, inflammation might attenuate 
transmission at the former site but exacerbate it at the latter*. In con- 
clusion, disease progression in HIV infection emerges from the balance 
between the beneficial antiviral effects of inflammation, its detrimental 
systemic and immunologic effects, and its unique role of providing acti- 
vated CD4 T-cell targets for HIV. Interfering with one part of this unstable 
equilibrium has unpredictable consequences. Thus, while there is good 
reason to use both pro- and anti-inflammatory therapeutic approaches 
in the treatment and prevention of HIV infection, they should be embarked 
upon with careful assessment of the virological and immunological con- 
sequences before widespread implementation. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Dose escalation study. We performed a dose escalation study in two rhesus maca- 
ques with chronic SIV\yacos) infection and different frequencies of CD4 T cells and 
CCRS5* CD4 and CD8 T cells to maximize the likelihood of detecting a response. 
IFN- lant was dosed three times a week based on dosing of recombinant IFN-02a 
(Roferon-A, Roche, Switzerland) for hepatitis C infection. We administered 50 jg of 
IFN- lant for one week, 200 j1g for one week, 500 jig for one week and 800 1g for one 
week. Based on a dose-dependent increase in CD4 T-cell frequency and decrease in 
the frequencies of CCR5* CD4 and CCR5* and Ki67* CD8T cells (Extended Data 
Fig. 1a-d), and because of ease of administration, we decided on 1 mg of IFN- lant. 
Daily dosing was chosen as the effects appeared to be variable based on the time 
since last dosing. 

Macaques and experimental design. To examine the effects of blocking type I 
IEN signalling, healthy, S[V-uninfected Mamu A01 B08 B17” adult Macaca mulatta 
received 1 mg daily intramuscularly of the IFN-I receptor antagonist (IFN- lant, 
1 mgm, 2 = 6; synthesized as previously described’) or 1 ml normal saline (n = 9) 
by intramuscular injection for 4 weeks (see Extended Data Fig. le). The macaques 
were inoculated weekly, starting the first day of treatment, with 1 ml of SIVacosi Via 
rectal challenge (1 ml of a 1:25 dilution, stock 3 x 10° SIV RNA copies ml), up to 
three times until infection was confirmed (SIV pVL >250 copies ml '). Blood was 
sampled several times per week, and lymph node and jejunal biopsies were per- 
formed before treatment and at the end of the 4-week period. All blood draws and 
biopsies were performed before inoculation with SIVqacos: and before any drug 
administration on that day. The macaques were followed for up to 40 w.p.i. 

To examine the effect of exogenous IFN-I treatment, six healthy, SIV-uninfected 
Mamu A017 B08~B17~ adult macaques received 6 jigkg~! pegylated IFN-o.2a 
(Pegasys, Genentech USA) intramuscularly weekly starting one week before the first 
STV macgs1 inoculation. Dose was based on prior efficacy studies in rhesus macaques”. 
The macaques were challenged intrarectally weekly on the same day as, but before, 
IFN-«2a administration, with the same high-dose SIVqacosi inoculation as the IFN- 
lant and placebo macaques, until infection was confirmed. All blood draws and 
biopsies were performed before inoculation with SIVacos: and before any drug 
administration on that day. The macaques were followed for a total of 12 weeks 
after infection and then euthanized per protocol. To examine the effect of exogen- 
ous IFN-I treatment in the absence of SIV infection, three healthy, SIV-uninfected 
Mamu A01 B08 B17” adult macaques received 6 mgkg | pegylated IFN-02a 
(Pegasys, Genentech USA) intramuscularly weekly for 3 weeks. Blood was sampled 
twice weekly starting 1 week before IFN-«2a treatment through to 1 week after the 
last dose. The lymph node, jejunum and rectum were biopsied before IFN-o2a 
treatment, 1 week after the first dose and 1 week after the last dose. 

All macaques were housed at Bioqual, Inc., and assigned randomly to treatment 
or placebo arms. TRIMCyp, the fusion protein derived from TRIM5 and cyclo- 
philin A, was present in 1 placebo macaque, 2 IFN- lant macaques and 0 IFN-o2a 
macaques. One TRIM5a SPRY deletion was present in 4 placebo macaques, 3 IFN- 
lant macaques and 1 IFN-«.2a macaque (the macaque that required 5 challenges to 
become systemically infected) and two TRIM5a SPRY deletions in 2 IFN-lant 
macaques. There was no significant difference in genotype distribution between 
placebo and IFN- lant macaques or between placebo and IFN-«2a macaques based 
on Fisher’s test. Based on availability, the 6 IFN-lant macaques and 9 placebo 
macaques were male and ages 4 to 7 years, and the 6 IFN-«2a macaques were female 
with ages 9 to 15 years. The Vaccine Research Center Animal Care and Use Com- 
mittee approved all study protocols and procedures. 

Samples. Blood was collected in EDTA tubes. Plasma was collected and PBMCs 
were isolated by Ficoll density centrifugation. All of the jejunum and half the lymph 
node biopsy tissues and half of every tissue collected at necropsy were placed in 
RPMI with 10% fetal bovine serum (FBS) and transported on wet ice. The remain- 
ing tissues were placed in 4% paraformaldehyde and kept at room temperature 
overnight before being transferred to 80% ethanol and stored at 4 °C. Tissues were 
subsequently paraffin-embedded for in situ hybridization and immunohistochem- 
istry. Intestinal samples were incubated with RPMI + collagenase D (1 mgml') 
(Roche) + Penicillin-Streptomycin-Glutamine (Gibco) at 37°C for 30 min and 
then passed through a 70 um filter. Lymph node samples were passed through a 
70 um filter to remove debris. Cells isolated from both peripheral blood and tissues 
were either stained for flow cytometry or cryopreserved. 

Flow cytometry. Cellular activation and cell cycle entry were assessed by flow 
cytometry. PBMCs and cells from the jejunum and lymph nodes were stained with 
Aqua LIVE/DEAD Fixable Dead Cell Stain and antibodies to the following: CD4 
Qd605, CD8 Qd655 (Invitrogen); CD3 APC-Cy7, CD95 PE-Cy5, CD14 Pacific Blue, 
CCRS5 PE, CCR7 PE-Cy7, Ki67 FITC (BD Biosciences); CD28 ECD (Beckman Coulter); 
and HLA-DR Alexa 700PE (BD Biosciences, in-house conjugate). Cells were per- 
meabilized using a Cytofix/Cytoperm kit (BD Biosciences) for Ki67 detection and 
fixed with 1% formaldehyde (Tousimis). 


To assess antigen-specific responses and NK cell subsets, cryopreserved PBMCs 
were stimulated for 6h at 37 °C with SIV Gag/Env/Pol peptide pool (2 1g ml~') 
in the presence of CD28 and CD49d (BD Biosciences) and brefeldin A (Sigma, 
10g ml~'). Cells were stained with Aqua LIVE/DEAD Fixable Dead Cell Stain 
(Invitrogen) and antibodies to the following: CD3 APC-Cy7, CD14 Pacific Blue, 
IFN-y Cy7PE, TNF APC, CD107a FITC (BD Biosciences); Granzyme B PE (Caltag); 
CD4 Qd605, CD8 Qd655 (Invitrogen); CD16 Ax594, CD20 Ax700PE, CD56 Cy5PE, 
Perforin Ax680 (in-house conjugates, BD Biosciences). Cells were permeabilized 
using a Cytofix/Cytoperm kit (BD Biosciences) for intracellular cytokine detection 
and fixed with 1% formaldehyde (Tousimis). 

To evaluate cellular exhaustion, cryopreserved PBMCs were stimulated for 6h 
at 37 °C with SIV Gag/Env/Pol peptide pool (2 ig ml’) in the presence of brefeldin 
A (10 pg ml !). Cells were stained with Aqua LIVE/DEAD Fixable Dead Cell Stain 
(Invitrogen) and antibodies to the following: CD3 APC-Cy7, CD95 Cy5PE, IFN-y 
FITC, TNF APC, Bcl-2 PE (BD Biosciences); CD28 ECD (Beckman Coulter); ICOS 
Pacific Blue (Biolegend); CD4 Qd605, CD8 Qd655, Streptavidin Cy7PE (Invitrogen); 
PD-1 biotinylated (R&D Systems). Cells were permeabilized using a Cytofix/Cytoperm 
kit (BD Biosciences) for intracellular cytokine detection and fixed with 1% form- 
aldehyde (Tousimis). 

Transmitted/founder variant characterization. The number of transmitted/ 
founder variants were characterized blindly, as previously described”*, and depos- 
ited in GenBank under accession numbers KJ201031 to KJ201503. 
Determination of susceptibility of circulating SIV to IFN-a. Freshly isolated, 
CD8-depleted PBMCs from naive rhesus macaque donors were stimulated for 
3 days with 5 tg ml’ PHA and 100 U ml" IL-2 in RPMI supplemented with 10% 
FBS, 2 mM L-glutamine, 100 U ml penicillin and 100 ppg ml’ streptomycin (RPMI- 
Complete). Target cells were re-suspended at 10° cells ml” ' in RPMI-Complete con- 
taining various concentrations of recombinant human IFN-o. (PBL Interferon Source) 
and incubated at 37 °C for 4h. IFN-o.-containing culture supernatants were collected 
and stored at 37 °C. Target cells were split into duplicate cultures and spinoculated for 
2h at 800g with diluted, equivalent input amounts of SIV from plasma from 14 to 
18 d.p.i. from IFN-02a or placebo macaques. Cells were washed and re-suspended in 
matched, stored IFN- containing supernatants supplemented with 100 U ml“! 
IL-2. After incubation at 37 °C for 7 days, cell-free culture supernatants were col- 
lected and a SIV p27 antigen capture assay was used to detect the presence of viral 
p27 antigen according to the manufacturer’s instructions (ABL). 

Binding and neutralizing antibody assays. IFN-binding antibodies were assessed 
as previously described*'. To evaluate for neutralizing antibodies, A549 cells (American 
Type Culture Collection, Manassas, VA) were seeded at 1 X 10* per well in RPMI with 
2% FBS and 2 mM L-glutamine and incubated for 24h. For the standard curve, IFN- 
o2b (Hoffman La Roche, Nutley, NJ) was added to cells in twofold serial dilution from 
51U ml * to 0.041U ml *. For measurement of IFN-«2b-neutralizing antibodies 
in plasma, IFN-02b was added at a concentration of 0.5 IU ml~ - along with 160-fold 
diluted plasma. Control wells received medium only or 5[Uml~* IFN-o2b and 
13.1 pg ml’ of a control neutralizing antibody. After an additional 24h, the media 
was removed and replaced with RPMI 1640 containing 2% FBS, 2 mM L-glutamine 
containing encephalomyocarditis virus (EMCV, American Type Culture Collection, 
Manassas, VA) at a multiplicity of infection of 0.5. Cells were stained with crystal violet 
at 52 h and assessed for cytopathic effect as measured by optical density at 570 nm. 
Quantitative RT-PCR. RNA was extracted from PBMCs preserved in TRIzol (Life 
Technologies) or from thawed cryopreserved PBMCs by RNAzol RT (Molecular 
Research Center, Inc.) according to the manufacturers’ instructions. Purified RNA 
was added directly to a one-step quantitative RT-PCR reaction containing iScript 
RT-iTaq Taq enzyme mix (BioRad). MX1, OAS2 and {2 microglobulin were labelled 
with a 5’ FAM reporter and 3’ BHQ1 quencher (Biosearch Technologies). We used 
the following oligonucleotide sequences: MX1 F AGGAGTTGCCCTTCCCAGA, 
MX1 R CCTCTGAAGCATCCGAAATC, MX1 P TGACCAGATGCCCGCTGGT 
G; OAS2 F CAGTCCTGGTGAGTTTGCAGT, OAS2 RCAGCGAGGGTAAATCC 
TTGA, OAS2 P GCACTGGCATCAACAGTGCCAGA. 

MX1and OAS2 forward and reverse primers were used at 500 nM, and probes at 
200 nM. Samples were run on an Applied Biosystems Sequence Detection System 
7900HT (ABI). Expression levels of MX1 and OAS2 were normalized to B2 micro- 
globulin and calculated based on the AACT method. 

Transcriptome analysis. Total RNA was prepared as described above. Polyadenylated 
transcripts were purified on oligo-dT magnetic beads, fragmented, reverse transcribed 
using random hexamers and incorporated into barcoded cDNA libraries based on the 
Illumina TruSeq platform. Libraries were validated by microelectrophoresis, quan- 
tified, pooled and clustered on Illumina TruSeq v2 flowcells. Clustered flowcells 
were sequenced on an Illumina HiSeq 2000 in 100-base single-read reactions. 

New rhesus macaque genome. RNA-Seq data were analysed by alignment to a pro- 
visional assembly (deposited under BioProject accession PRINA214746) and annotation 
of a new Indian Macaca mulatta genome (data provided by R.B.N., University of 
Nebraska Medical Center and A. Zimin, University of Maryland). Comparison of 
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RNA-seq data generated using the v.4 assembly demonstrated a greater absolute 
number of mapping reads, higher proportion of mapped reads per sample than to 
the hg19 RefSeq or RheMac2 assembly (Supplementary Fig. 1 and Supplementary 
Information). 

RNA-seq data analysis. RNA-seq data were submitted to The Gene Expression 
Omnibus (GEO) repository at the National Center for Biotechnology Information 
(NCBI). RNA-seq data were aligned to a provisional assembly of Indian Macaca 
mulatta (MuSuRCA rhesus assembly v.4) using STAR version 2.3.0e*; parameters 
were set using the annotation as a splice junction reference, un-annotated non- 
canonical splice junction mappings and non-unique mappings were removed from 
downstream analysis. Transcripts were annotated using the provisional UNMC 
annotation v4.12. Transcript assembly, abundance estimates and differential expres- 
sion analysis were performed using Cufflinks v2.1.1 and Cuffdiff*’. Samples with <49% 
mapped reads or exhibiting considerable 3’ or 5' bias were excluded from further 
analysis. To reduce normalization bias due to varying read depths, samples were 
analysed in two separate groups: group 1 comprised 166 samples and contained the 
samples from PBMCs from the placebo+ SIV, IFN-lant+ SIV, and pegylated IFN- 
a2a-treated+SIV animals, and from samples for the analysis of PBMCs, LN CD4T 
cells and rectal biopsies of uninfected, IFN-02a-treated animals; the average num- 
ber of mapped reads was 12,188,890 (range: 3,237,374-87,37 1,564). Group 2 com- 
prised the lymph node CD4 T cells from the three SIV-infected groups consisting 
of 37 samples; the average mapped read count for this group was 4,898,448 (range: 
1,173,455-19,310,483). Differentially expressed genes were defined by pair-wise com- 
parison of each time point to the Day 0 baseline. We included genes that had any 
acute time point (0 vs 7, 10, 21, 28) that was significantly differentially expressed by 
a fixed discovery rate-corrected P value (q value) < 0.05. Differential gene lists were 
uploaded to Ingenuity Pathway Analysis software (v1.0 Ingenuity Systems, http:// 
www.ingenuity.com/) and pathways with significant enrichment by Fisher’s exact 
test and the Benjamini-Hochberg multiple testing correction were identified. Heat 
maps and other visualization were generated using Partek Genomics Suite v6.6. 
Gene-set enrichment analysis (GSEA). GSEA was performed using the desktop 
module available from the Broad Institute (http://www.broadinstitute.org/gsea/). 
FPKMs for samples from the IFN-«2a and placebo groups at 7 d.p.i. were pre-filtered 
to remove transcripts with insufficient read coverage, and then were ranked using the 
signal-to-noise statistic. The gene-set was comprised of genes determined to be 
upregulated in unstimulated FOXO3 ‘~ macrophages relative to macrophages from 
control mice”’. Significance was estimated using gene-set permutation. 

SIV in situ hybridization, immunohistochemistry and quantitative image 
analysis. SIV in situ hybridization was performed as previously described’. Immu- 
nohistochemistry for rabbit polyclonal anti-APOBEC3G (Prestige Antibodies 
Powered by Atlas Antibodies HPA001812; Sigma-Aldrich), TRIM5a (Prestige Anti- 
bodies Powered by Atlas Antibodies HPA023422; Sigma-Aldrich) and MX2 (Prestige 
Antibodies Powered by Atlas Antibodies HPA030235; Sigma-Aldrich) were per- 
formed using a biotin-free polymer approach (Rabbit Polink-2, Golden Bridge 
International, Inc.) on 5 jum tissue sections mounted on glass slides, which were 
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dewaxed and rehydrated with double-distilled H,O. Antigen retrieval was per- 
formed by heating sections in 0.01% citraconic anhydride containing 0.05% Tween- 
20 in a pressure cooker set at 122 °C for 30 s. Slides were rinsed in ddH,O, incubated 
with blocking buffer (TBS containing 0.25% casein) and incubated with diluted 
rabbit anti- APOBEC3G, rabbit anti- TRIM5« or rabbit anti- MX2 in blocking buffer 
overnight at 4°C. Tissue sections were rinsed in wash buffer (1 X TBS containing 
0.05% Tween-20) for 10 min followed by an endogenous peroxidase blocking step 
using 1.5% (v/v) H2O> in TBS (pH 7.4) for 10 min and placed in wash buffer. Slides 
were incubated with rabbit Polink-2 HRP polymer-staining system (Golden Bridge 
International, Inc.) according to manufacturer’s recommendations (20-30 min at 
room temperature) then rinsed in wash buffer. Tissue sections were developed with 
Impact 3,3’-diaminobenzidine (Vector Laboratories), counterstained with haema- 
toxylin and mounted in Permount (Fisher Scientific). All stained slides were scanned 
at high magnification (x 200) using the ScanScope CS System (Aperio Technologies, 
Inc.) yielding high-resolution digital scans of the entire tissue section. Regions of 
interest of defined area (ROIs; 500 jtm*) were saved on the digital image using the 
Aperio rectangle tool (representing nearly the entire lymph node section) and high- 
resolution images were extracted from the ROIs of each whole-tissue scan. The per 
cent area of the lymph node (all anatomical compartments were included) that 
stained for APOBEC3G, TRIM5a and MX2 were quantified under blind analysis 
using Photoshop CS5 and Fovea tools. 

Statistical methods. Based on our previous data of rhesus macaques with acute 
SIV infection, the standard deviation for SIV RNA levels is 0.5 X 10° copies ml~ i” 
Using this value, 6 macaques in the IFN- lant or IFN-0.2a group and 9 macaques in 
the placebo group would give us 80% power to detect a 0.8 X 10° copies ml”! 
difference in SIV RNA levels between IFN-lant or IFN-o.2a and placebo groups. 
Macaques were assigned to their respective groups randomly. Experiments, except 
as noted above, were not performed blindly. All replicates are biological replicates. 
Each experiment was performed once. Comparisons between groups at singular 
time points were performed with the Mann-Whitney U test, comparisons within 
groups with the Wilcoxon matched-pairs signed rank test, survival curve compar- 
isons for per cent survival (IFN- lant) and per cent uninfected (IFN-«2a) with the 
log-rank (Mantel-Cox) test and correlations with Spearman coefficient, all using 
GraphPad Prism v5.0d. Comparisons of AUCs were performed using linear 
regression analysis adjusting for baseline values on JMP v10. 


31. Vanderford, T. H. et a/. Treatment of SIV-infected sooty mangabeys with a type-I 
IFN agonist results in decreased virus replication without inducing 
hyperimmune activation. Blood 119, 5750-5757 (2012). 

32. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 
15-21 (2013). 

33.  Trapnell, C. et a/. Differential analysis of gene regulation at transcript resolution 
with RNA-seq. Nature Biotechnol. 31, 46-53 (2013). 

34. Brenchley, J. M. etal. Differential infection patterns of CD4+ T cells and lymphoid 
tissue viral burden distinguish progressive and nonprogressive lentiviral 
infections. Blood 120, 4172-4181 (2012). 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


60; 


_ 50 , 200, 500,800 pgm! 


-14 0 14 28 42 


754 _ 50 , 200, 500, 800 pg/ml 


50; 


254 


@ CD8CCR5+(%) © CD4T cells (%) © 


Week 
post- 


infection “0 


Placebo saline 


MAC251 


SIV 


MAC251 


Extended Data Figure 1 | Dose escalation study for IFN-lant and 
experimental schema. a-d, Effects of three times weekly IFN-1ant dosing on 
the frequency of CD4 T cells (a), CCR5* CD4 T cells (b), CCR5* CD8 T cells 
(c) and Ki67~ CD8 T cells (d) in 2 rhesus macaques. Dose was 50 lig in week 1, 
200 ug in week 2, 500 jug in week 3 and 800 pig in week 4. Vertical dotted lines 
indicate the days a new dose was started. Black lines connect time points 4 days 
after the first dose. Grey shading indicates treatment period. e, Six macaques 
received 4 weeks of IFN-lant intramuscularly starting at day 0 and were 
challenged intrarectally with 1 ml of a 1:25 dilution of SIVyacos: (stock 
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concentration 3 X 10° SIV RNA copies ml‘) at day 0 and followed until 
developing end-stage AIDS. Nine macaques were treated with 4 weeks of 
placebo saline intramuscularly starting at day 0 and challenged intrarectally 
with SIV\racosi at day 0 and followed. Six macaques were injected weekly with 
IFN-«2a starting 1 week before the first challenge and through 4 w.p.i. 
Macaques required 2, 3 or 5 challenges to acquire systemic infection. Thus, 
macaques received 6, 7 or 9 doses of IFN-«.2a. Macaques were necropsied at 
12 w.p.i. per protocol. 
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Extended Data Figure 2 | Effects of IFN-1ant on IFN-stimulated genes and 
virus burden. a, b, MX1 (a) and OAS2 (b) expression by qRT-PCR during 
acute SIV infection in IFN- lant (red, n = 6) and placebo (blue, n = 9) 
macaques. P values were calculated by Mann-Whitney U test. c, ISGs in 
PBMCs in IFN- lant and placebo macaques. P values represent the comparison 
between IFN- lant (n = 6) and placebo (n = 9) macaque FPKMs at 7 d.p.i. 

d, e, SAMHDI1 (d) and APOBEC3G (e) expression in the lymph nodes in 
IFN- lant (n = 6) and placebo (n = 9) macaques. P values were calculated by 
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Mann-Whitney U test. f, g, Plasma SIV RNA levels at 12 w.p.i. (f) or at peak 
(g) stratified by the day that MX1 or OAS2 expression peaked in PBMCs in 
IFN- lant (n = 6) and placebo (n = 9) macaques. VL, viral load. P values were 
calculated by Mann-Whitney U test. h, SIV gag levels in PBMCs stratified 

by the day that MX1 or OAS2 expression peaked in PBMCs in IFN- lant (n = 6) 
and placebo (n = 6) macaques. P values were calculated by Mann-Whitney U 
test. For all panels, IFN-lant-treated macaques are represented in red, 
placebo-treated macaques in blue. 
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Extended Data Figure 3 | Effects of IFN-lant on CD4 T cells and on 
immune activation. a, b, CD4/CD8 T-cell ratio in peripheral blood (a) and 
lymph node (LN) (b) in IFN-1ant (Ant, n = 6) and placebo (Plac, n = 9) 
macaques. Shading indicates treatment period. Error bars indicate range. Red 
vertical line indicates day 0 of systemic SIV infection. For all panels, horizontal 
bars indicate median values, and P values at different time points within 
treatment groups were calculated by Wilcoxon matched pairs signed rank test 
and between groups by Mann-Whitney U test. c-f, T-cell activation in lymph 
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The DNA methylation landscape of human 


early embryos 


Hongshan Guo'*, Ping Zhu**, Liying Yan'3*, Rong Li!3*, Boqiang Hu', Ying Lian’, Jie Yan'*, Xiulian Ren’*, Shengli Lin’, 
Junsheng Li'?, Xiaohu Jin’, Xiaodan Shi', Ping Liu’, Xiaoye Wang*, Wei Wang“, Yuan Wei’, Xianlong Li’, Fan Guo, 
Xinglong Wu, Xiaoying Fan’, Jun Yong°, Lu Wen!, Sunney X. Xie, Fuchou Tang’® & Jie Qiao? 


DNA methylation is a crucial element in the epigenetic regulation 
of mammalian embryonic development’*. However, its dynamic 
patterns have not been analysed at the genome scale in human pre- 
implantation embryos due to technical difficulties and the scarcity 
of required materials. Here we systematically profile the methylome 
of human early embryos from the zygotic stage through to post- 
implantation by reduced representation bisulphite sequencing and 
whole-genome bisulphite sequencing. We show that the major wave 
of genome-wide demethylation is complete at the 2-cell stage, contrary 
to previous observations in mice. Moreover, the demethylation of the 
paternal genome is much faster than that of the maternal genome, and 
by the end of the zygotic stage the genome-wide methylation level 
in male pronuclei is already lower than that in female pronuclei. The 
inverse correlation between promoter methylation and gene expres- 
sion gradually strengthens during early embryonic development, reach- 
ing its peak at the post-implantation stage. Furthermore, we show 
that active genes, with the trimethylation of histone H3 at lysine 4 
(H3K4me3) mark at the promoter regions in pluripotent human 
embryonic stem cells, are essentially devoid of DNA methylation in 
both mature gametes and throughout pre-implantation development. 
Finally, we also show that long interspersed nuclear elements or short 
interspersed nuclear elements that are evolutionarily young are dem- 
ethylated to a milder extent compared to older elements in the same 
family and have higher abundance of transcripts, indicating that early 
embryos tend to retain higher residual methylation at the evolution- 
arily younger and more active transposable elements. Our work pro- 
vides insights into the critical features of the methylome of human 
early embryos, as well as its functional relation to the regulation of 
gene expression and the repression of transposable elements. 
DNA methylation is an important form of epigenetic modification 
and has a crucial role in many biological processes, including repression 
of gene transcription, maintenance of gene imprinting and X-chromosome 
inactivation, and repression of transposable elements’*’. The most dra- 
matic genome-wide changes of the methylome in mammals occur in 
primordial germ cells and during pre-implantation development** ». 
To obtain DNA methylation maps of human early embryos, we per- 
formed reduced representation bisulphite sequencing (RRBS) on human 
gametes, as well as pre- and post-implantation embryos (Supplementary 
Tables 1 and 2). The methylation profiles form four distinct clusters dur- 
ing early embryonic development (Extended Data Fig. 1a, b): the hypo- 
methylated oocytes and polar bodies, the hypermethylated sperm, the 
hypomethylated cleavage-stage embryos, and the hypermethylated post- 
implantation embryos, which are similar to those found in mice’. More- 
over, the overall DNA methylation level of the gene body (the genomic 
region from transcription start site (TSS) of a gene to its transcription 
end site (TES)) is higher than that of neighbouring intergenic regions 
and there is a markedly hypomethylated region around the TSS, similar 


to patterns observed in other types of cells’. The methylation level of the 
gene body is relatively even, with a slight increase from the TSS to the 
TES and a clear reduction after the TES (Fig. 1a). Recently, it has been 
shown that mouse oocytes exhibit relatively high levels of non-CpG 
methylation in their genomes'®”, but this has not been investigated in 
human oocytes. We found that in human mature oocytes, as well as first 
and second polar bodies, there are significant levels of non-CpG methyl- 
ation around gene body regions in the genome; the enrichment pattern 
of this non-CpG methylation was similar to that of CpG methylation 
(Extended Data Fig. 1c, d). Moreover, the level ofnon-CpG methylation 
around gene bodies in the oocytes is clearly correlated with the level of 
expression of corresponding genes, indicating the potential functional 
significance of non-CpG methylation (Extended Data Fig. le). 
Previous observations in mice show that the most marked demethy- 
lation occurs at the zygotic stage, with mild gradual demethylation from 
this point onwards until the blastocyst stage”. We found that the methy- 
lomes of human embryos are similar to those of mouse embryos, but do 
have distinct features. A dramatic decrease in DNA methylation occurs 
between fertilization and the 2-cell stage, with the average level of methy- 
lation decreasing from 54% in the sperm and 48% in metaphase I (MII) 
oocytes, to 41% in the zygotes and further to 32% in the 2-cell embryos 
(Fig. 1b and Extended Data Fig. 2a). Notably, there are subtle changes 
in the level of DNA methylation between the 2-cell and morula stages, 
which differs from findings in mice. A further reduction of DNA methy- 
lation to around 29% occurs as the embryo progresses from the morula 
stage to the blastocyst stage in the inner cell mass (ICM). Following implan- 
tation, a sharp increase in the level of methylation is observed (Fig. 1b 
and Extended Data Fig. 2b-d). The genomic regions with high CpG den- 
sity tend to be hypomethylated, whereas those with low CpG density 
tend to be hypermethylated (Extended Data Fig. 2e-g). Notably, the 
genome-wide demethylation in mouse embryos occurs mainly at the 
1-cell stage, whereas in human embryo the demethylation occurs from 
fertilization to the 2-cell stage. When we analysed the genomic regions, 
such as the genic and intergenic regions separately, the patterns of demeth- 
ylation and re-methylation are similar to those found when analysing 
the genome-wide levels of demethylation, indicating that the dynamic 
changes in DNA methylation are in general universal throughout the 
entire genome (Extended Data Fig. 3a, b). Although RRBS analysis covers 
the majority of CpG islands (CGIs), which are probably the most infor- 
mative methylation sites, it only covers around 10% of all the CpG sites 
in the human genome, leaving CpG-sparse regions unexplored. To resolve 
this issue and to obtain absolute quantification of DNA methylation at 
the whole-genome scale, we performed whole-genome bisulphite sequenc- 
ing (WGBS) on the ICM and post-implantation embryos, which account 
for the lowest and highest methylation status of the genome during early 
embryonic development, respectively (Supplementary Table 3). We found 
that the genome-wide methylation levels in the ICM and post-implantation 
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Figure 1 | Dynamics of the DNA methylome in human early embryos. 

a, Averaged DNA methylation levels along the gene bodies and 15 kilobases 
(kb) upstream of the transcription start sites (TSS) and 15 kb downstream of the 
transcription end sites (TES) of all RefSeq genes. b, Methylation landscape 
across each stage of human early embryos. The averaged DNA methylation 
level of each developmental stage is calculated based on the overlapped 
100-base-pair (bp) tiles detected in all of the developmental stages analysed. 
c, Averaged DNA methylation levels of human sperm and early embryos 
analysed by whole genome bisulphite sequencing (WGBS), including sperm 
(yellow bar; previously published’*), ICM of the blastocysts (green bar), and 
post-implantation embryos (liver, red bar). d, Averaged DNA methylation 
levels of individual male and female pronuclei of zygotes at different time points 
after intra-cytoplasmic sperm injection (ICSI). Note that the DNA methylation 
levels decrease dramatically in both male (red line) and female (blue line) 
pronuclei and that at the late pronuclear stage the genome-wide methylation 
levels of male pronuclei are already lower than those of female pronuclei. The 
diamond and square represent sperm and MII oocytes, while the triangle and 
circle represent male and female pronuclei, respectively. The black triangle 
represents one outlier of the male pronuclei. The averaged DNA methylation 
level of each individual pronucleus is calculated based on individual CpG sites 
covered by three or more sequencing reads. 


embryos were 42.0% and 78.1%, respectively (Fig. 1c and Extended 
Data Fig. 4a), and that the distribution of DNA methylation on and 
around gene bodies accurately matched that inferred by RRBS (Fig. la 
and Extended Data Fig. 4b). 

Wealso analysed DNA methylation patterns in the first and second 
polar bodies and found that they are comparable to those found in MII 
oocytes (Extended Data Fig. 4c). This indicates that during the extrusion 
of the first and second polar bodies from the oocytes, there is no genome- 
wide asymmetric pattern of DNA methylation between them. Similarly, 
the methylation levels of different genomic regions in the trophectoderm 
of the blastocysts are only slightly lower than those in the ICM (Extended 
Data Fig. 4d). 

To gain further insight into the mechanism of the genome-wide deme- 
thylation, we analysed the methylomes of individual male and female pro- 
nuclei separately using a single-cell RRBS technique that we have recently 
developed’* (Extended Data Fig. 5a, b and Supplementary Table 3). We 
found that the demethylation process is very heterogeneous as the meth- 
ylation levels of individual male (or female) pronuclei at the same time 
point following intra-cytoplasmic sperm injection show marked varia- 
tion (Fig. 1d). Furthermore, the demethylation of the paternal gnome 
is much faster than that of the maternal genome, and in late pronuclear 
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stage zygotes the genome-wide methylation level in male pronuclei is 
already lower than that in female pronuclei (Fig. 1d and Extended Data 
Fig. 5c). This was further verified by immunostaining (Extended Data 
Fig. 6). Interestingly, both male and female pronuclei showed some evi- 
dence of hydroxymethylation, although usually the signals in male pro- 
nuclei were stronger than those in female ones. 

Next we analysed the similarities and differences in DNA methylation 
between sperm and oocytes. Sperm and oocytes exhibited comparable 
methylation patterns for 64.3% ofall the covered 100-bp tiles, which are 
either hypermethylated (methylation levels = 75%) or hypomethylated 
(methylation levels = 25%) in both gametes (Extended Data Fig. 7a, b). 
Notably, the hypermethylated regions in both gametes exhibited similar 
DNA methylation dynamics as the mean genomic changes. These regions 
are strongly enriched in transposable elements such as short interspersed 
nuclear elements (SINEs) and long interspersed nuclear elements (LINEs) 
(Extended Data Fig. 7c), indicating that the hypermethylated regions 
in gametes of both sexes are mainly enriched in transposable elements, 
probably repressing their transcription and activity, as well as in introns, 
probably regulating gene transcription and splicing. In contrast, the 
hypomethylated regions are enriched in high-density CpG promoters, 
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Figure 2 | Key features of gamete-specific differentially methylated regions 
(DMRs). a, Heat map of the methylation level of oocyte-specific DMRs among 
different genomic regions across different developmental stages. b, Heat map 
of the methylation level of sperm-specific DMRs among different genomic 
regions across different developmental stages. In panels a and b, the colour 
keys from green to red indicate low to high methylation level, respectively. 

c, Representative locus of a known maternal imprinting gene, KCNQ1, covered 
in our RRBS data set. The blue bars indicate the DNA methylation levels 

of different CpG sites. The region was fully methylated in MII oocytes, 
unmethylated in sperm cells and around 50% methylated in cleavage-stage 
embryos and post-implantation embryos. d, One known imprinting locus and 
one potential novel imprinting locus showing ASM, tracked with SNPs to 
distinguish their allele origins. The paired reads generated from the WGBS data 
sets with heterozygous SNPs were selected to show the DNA methylation levels 
of the two alleles separately. 
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enhancers of a wide variety of human tissues, exons, and CGIs, and are 
kept hypomethylated throughout early embryonic development. 
Wealso examined differentially methylated regions (DMRs) between 
sperm and oocytes. In total, we identified 17,473 sperm-specific DMRs 
and 12,145 oocyte-specific DMRs (Extended Data Fig. 7d, e). The sperm- 
specific DMRs were relatively enriched in intergenic regions whereas 
oocyte-specific DMRs were enriched in intragenic regions (Fig. 2a, b 
and Extended Data Fig. 7f). Sperm-specific DMRs were strongly enriched 
for the tissue-specific enhancers marked by H3K4mel ina wide variety of 
tissues compared with oocyte-specific DMRs. This indicates that sperm 
tends to hypermethylate enhancer elements that are active or permissive 
in tissues other than male germ cells, probably to avoid their aberrant acti- 
vation during male germ cell development. As expected, oocyte-specific 
DMRBs localize to CGIs more frequently than sperm-specific DMRs. Sim- 
ilar to those in mice, some of the oocyte-specific DMRs probably retain 
allele-specific DNA methylation (ASM) patterns during pre-implantation 
development with an average methylation level around 50%. On the con- 
trary, the sperm-specific DMRs rapidly lose methylation before the 2-cell 
stage and retain only background levels of methylation. However, the 


ba e Sperm =e Histone mark (peak regions) 
© MIl oocyte —* No histone mark 
& 400 H3K4me3 H3K27me3 H3K9me3 
oO 
> 
2 75+ = 
6 
§% 50; T t z 
= ° i 
@ 254 
E ee 
< ee OE 
rs 0-74 q J q TTT TT v £ x TT TT TT T N N bak wel 
EE PERI OEE FS REL SEE ES PERSO EES 
SIP VEEP PLOY WBS EP LALO VY WSO VPS 
sr NC we 
a ww) Ze 
x SX 
a “e ¢ 
b 3 
= H3K27me3 — DNA methylation 
240 MIl oocyte Zygote 2-cell 
Oo r=-0.35 r= -0.36 2 
5 0.8 g 
c 06 Ss 
B04 $ 
SD 0.2 = 
< 0.0 - 90 9 
[st Post-implantation Ss 
2 1.0 =A gtad +100 > 
ed 0.8 r=—0. 80 = 
€ 0.6 60 & 
§ 0.4 40 $ 
om 0.2 20 9 
t 0.0 0 
Ranked peaks overlapped with promoters 
c 
«= Gene expression — Promoter methylation 
MIl oocyte Zygote 2-cell 4-cell 
1.0 r=-0.19 r=-0.22 r=-0.18 ig 
0.6 <= 
3 60 = 
3B 02 t40 @ 
c }20 2 
2 -0.2 0 < 
2 S 
© 8-cell Morula ICM Post-implantation @ 
a 1.0 5+ 100 > 
x r=-0.19 r=-0.21 r= -0.4 = 
oO +80 © 
2 0.6) eo = 
Li <x 
© 0.24 40 Zz 
20 9 
-0.2 0 
Ranked genes 
d Genes with decreased expression levels and promoters 


with increased methylation levels from ICM to post-implantation 


Translation 

Translational elongation 

Regulation of protein ubiquitination 
Ribonucleoprotein complex biogenesis 
Proteasomal ubiquitin-dependent protein catabolic process 
Regulation of ubiquitin-protein ligase activity 
RNA processing 

Regulation of protein ubiquitination 
Proteasomal protein catabolic process 
Ribosome biogenesis 

rRNA metabolic process 

ncRNA metabolic process 


0 2 4 6 86 40 
—log (P value) 


608 | NATURE | VOL 511 | 31 JULY 2014 


majority of both oocyte-specific and sperm-specific DMRs were re- 
methylated after implantation (Fig. 2a, b and Extended Data Fig. 7d, e). 

Although RRBS analysis shows relatively poor coverage for imprint- 
ing genes, for the known imprinting DMRs it covers we found that the 
methylation levels are accurately maintained at around 50% from zygotic 
stage to post-implantation stage, as expected (Fig. 2c, d and Extended 
Data Fig. 8a, b). To extend the analyses to potentially novel imprinting 
genes, we took advantage of our WGBS data set from a post-implantation 
embryo and performed ASM analyses based on identified heterozygous 
SNPs. We found that known imprinted genes with enough sequencing 
coverage and available heterozygous SNPs showed ASM as expected. 
Moreover, we found 120 novel ASM regions (Fig. 2d, Extended Data 
Fig. 8c, dand Supplementary Table 4), some of which may potentially 
be novel imprinted DMRs. 

Next, we analysed the relationship between DNA methylation and 
histone modifications, both of which contribute to regulation of gene 
expression®"*. Since human embryonic stem (ES) cells are derived from 
and also similar to the pluripotent ICM of blastocysts, we used the chro- 
matin immunoprecipitation followed by sequencing (ChIP-seq) data 
set of the histone modifications of human ES cells compiled as part 
of the ENCODE Project (see online Methods)’”"’. Notably, we found 
that the H3K27me3 regions generally have low levels of DNA methyla- 
tion in human ES cells and the ICM; this is also the case in gametes and 
cleavage-stage embryos. As a control, the regions free of H3K27me3 
peaks tend to have much higher levels of DNA methylation in the ICM 
(Fig. 3a). This indicates that, similar to human ES cells, in the pluripo- 
tent ICM of blastocysts the genomic regions containing H3K27me3 marks 
probably exhibit little DNA methylation. In fact, when we analysed the 
DNA methylation of promoter regions of genes with H3K27me3 marks, 
we found that there is a strong negative correlation between the DNA 
methylation level of the promoter and H3K27me3 enrichment, indi- 
cating that in the pluripotent ICM, DNA methylation and H3K27me3 
repress different sets of target genes (Fig. 3b and Extended Data Fig. 9a). A 
similar mechanism has been found during the early lineage differentiation 
of human ES cells”°?!. In contrast to H3K27me3, the H3K9me3-marked 


Figure 3 | Relationships between DNA methylation, histone modifications 
and gene expression. a, Average DNA methylation levels of regions with 
histone marks (red line) and without these histone marks (blue line) among 
consecutive developmental stages (details of biological replicates of each stage 
are listed in Supplementary Table 1) and human ES cells (n = 2). The left, 
middle and right panels represent H3K4me3, H3K27me3 and H3K9me3, 
respectively. The green dot indicates sperm, while the purple dot indicates MII 
oocytes. Data are mean + 95% confidence interval (+ 1.96 standard error of 
the mean (s.e.m.)). b, Scatter plot of the signal intensities of H3K27me3 
ChIP-seq peaks within promoter regions in human ES cells and the DNA 
methylation levels of the corresponding peak regions. The Pearson correlation 
coefficients (r) between peak signal intensities and DNA methylation levels 
of peak regions across every developmental stage were calculated and are 
included on the top right corner of each panel. The red and blue fitting curves 
represent peak signal intensity and DNA methylation level in corresponding 
regions, respectively. The horizontal axis from left to right of each box 
represents the H3K27me3 peaks, which overlapped with promoter regions, 
ranked by peak signal intensities from high to low, respectively. c, Scatter plot 
of DNA methylation levels of promoter regions and the relative expression 
levels of corresponding RefSeq genes. The log, of the gene expression levels 
(reads per kilobase per million) were calculated and are presented. The Pearson 
correlation coefficients (r) between DNA methylation levels of promoter 
regions and the scaled expression levels of the corresponding genes across every 
developmental stage were calculated and are included in the top right corner 
of each panel. The red and blue fitting curves in each display represent gene 
expression levels and DNA methylation levels in promoter regions, 
respectively. The horizontal axis from left to right below each box represents 
the expression levels from high to low of RefSeq genes, respectively. d, Gene 
ontology analysis of the genes inactivated during implantation (fold change 
(ICM divided by post-implantation) = 2), (P = 0.05, P value is calculated based 
on the multiple Student’s t-test), while the methylation levels of their promoters 
increased. The x axis represents the negative log of the P values of the 
enrichment of the corresponding gene ontology terms. 
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regions in human ES cells exhibit similar DNA methylation levels com- 
pared with the control regions in the ICM, indicating that some overlap 
exists between DNA methylation and H3K9me3 in the genomes of the 
ICM of blastocysts, probably at constitutive heterochromatin regions 
(Fig. 3a and Extended Data Fig. 9b). 

Wealso examined H3K4me3, a mark of promoters”, and found that 
H3K4me3 peaks tend to have very low levels of DNA methylation even 
in the oocytes and sperm (10% and 5%, respectively) (Fig. 3a). That is, 
the genes that are active, or at least permissive, in pluripotent human ES 
cells, and probably also in pluripotent ICM cells, are retained with min- 
imal levels of DNA methylation in the gametes, and DNA demethyla- 
tion is in general not required for these regions during early embryonic 
development (Extended Data Fig. 9c). 

Next we investigated how DNA methylation regulates gene expres- 
sion, using the single-cell transcriptome data we recently published”. As 
expected, the DNA methylation levels at promoter regions negatively 
correlate with the expression levels of corresponding genes during early 
embryonic development'*”°. Moreover, the strength of this correlation 
increases gradually from the zygotic stage to post-implantation, especially 
after zygotic genome activation at the 8-cell stage” (Fig. 3c and Extended 
Data Fig. 9d). This indicates that although the genome-wide demethy- 
lation and re-methylation occur during early embryonic development, 
the DNA methylation at promoter regions still represses the expression 
of corresponding genes. It also indicates that after the zygotic genome 
activation at the 8-cell stage, the methylome and transcriptome of the 
embryos have direct corresponding relations and the repression of gene 
expression by DNA methylation at promoter regions is more promi- 
nent. Furthermore, during the genome-wide re-methylation following 
implantation, 258 genes with increased promoter methylation showed 
decreased RNA expression from ICM to post-implantation embryos. 
Gene ontology analysis showed that this set of genes was clearly enriched 
for the terms associated with translation and ubiquitination-related pro- 
tein degradation, which indicates that the translation and protein degra- 
dation processes in proteasomes undergo fundamental changes during 
implantation, potentially due to DNA re-methylation (Fig. 3d). Notably, 
the DNA methylation on the gene bodies shows positive correlation with 
the expression levels of corresponding genes during pre-implantation 
development (Extended Data Fig. 9e, f). 

Furthermore, we explored how DNA methylation contributes to the 
repression of transposable elements. We found that the expression levels of 
SINE/variable number of tandem repeats/Alu elements (SV As) increased 
sharply from the 4-cell stage to morula stage (Extended Data Fig. 10a). 
This indicates that after the genome-wide demethylation of transpos- 
able elements at the 2-cell stage, the transcriptional activities of SVAs 
increase markedly for a short period but decrease back to the basal level 
before the genome-wide re-methylation, probably due to repression mech- 
anisms other than DNA methylation (Extended Data Fig. 10b). SINEs, 
LINEs and long terminal repeats retained relatively abundant transcripts 
before the 8-cell stage, following which their transcripts gradually decreased 
to the basal level in post-implantation embryos (Fig. 4a, b and Extended 
Data Fig. 10c). Moreover, we found that SINEs and LINEs with differ- 
ent evolutionary ages show different demethylation patterns (Fig. 4a—d 
and Extended Data Fig. 10d-f). For example, both LINE-1 (L1) and LINE-2 
(L2) belong to the LINE family of transposable elements, with L1 being 
evolutionarily younger than L2 (ref. 25). The younger L1 shows a higher 
methylation level than L2 in oocytes, whereas they show a comparable 
level of methylation in sperm. More importantly, we found that during 
the genome-wide demethylation process, the evolutionarily younger 
LI retains higher levels of residual methylation than L2. L1 was also re- 
methylated to a higher methylation level after implantation (Fig. 4d). 
This pattern persists when further subdividing L1 into sub-groups, such 
as L1PA and LIME with different evolutionary ages (Fig. 4e and Extended 
Data Fig. 10e, f). Similar patterns were also found for Alu elements and 
mammalian-wide interspersed repeats (MIR), two subfamilies of SINEs 
with younger and older evolutionary ages, respectively (Fig. 4a, b). This 
indicates that the evolutionarily young transposable elements with higher 
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Figure 4 | Dynamics of DNA methylation and expression patterns of 
transposable elements. a, Line chart of the relative expression level 
(sequencing read counts, normalized by total mappable RefSeq read counts) of 
short interspersed nuclear elements (SINEs). b, Average DNA methylation 
levels of Alu and MIR across early embryonic stages. c, Line chart of the relative 
expression level (sequencing read counts, normalized by total mappable RefSeq 
read counts) of long interspersed nuclear elements (LINEs). d, Average 
DNA methylation levels of L1 and L2 across early embryonic stages. e, Average 
DNA methylation levels of two subfamilies of L1 repeat elements across 
early embryonic stages, with L1PA evolutionarily younger than LIME. The 
green dot indicates sperm, while the purple dot indicates MII oocytes. All data 
in panels a-e are mean + 95% confidence interval (+ 1.96 s.e.m.). Biological 
replicates in panels a and c: MII oocyte (n = 3), zygote (n = 3), 2-cell (n = 6), 
4-cell (n = 12), 8-cell (n = 20), morula (n = 14), ICM (n = 10), post- 
implantation (n = 3); details for biological replicates of each stage in panels 
b, d, e are listed in Supplementary Table 1. 


transcriptional activity tend not to be demethylated to the same extent 
as older elements and also retain higher levels of remnant methylation, 
probably to maintain stronger repression of their transcription and 
activity by DNA methylation (Extended Data Fig. 10). 

Our work provides a comprehensive atlas at the genome-wide scale 
of the DNA methylation landscape in human mature gametes” and early 
embryos, which offers new insights that are distinct from those derived 
from previous findings in mice’. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Informed consents and bioethical approval. This study was approved by the 
Reproductive Study Ethics Committee of Peking University Third Hospital (Research 
license 2012SZ015). All of the gametes and embryos were collected voluntarily after 
obtaining the written informed consents signed by donor couples. All of the gametes 
and embryos were obtained from the donors at the Center for Reproductive Medicine 
in Peking University Third Hospital using standard clinical protocols as described”””*. 
The donors of oocytes, zygotes, 2-cell and 4-cell embryos were financially compen- 
sated for their effort, time, inconvenience and discomfort associated with the dona- 
tion process. 

Preparations of human oocytes and pre- and post-implantation embryos. The 
average age of all donors in this study was 30 years old. All embryos used in this 
research have good morphology with appropriate developmental speed”**. Oocytes, 
sperm, zygotes, 2-cell stage and 4-cell stage embryos were voluntarily donated from 
the healthy volunteers who have already had one or two healthy children from natural 
pregnancy. 

Oocytes were obtained from the donors at the Center for Reproductive Medicine in 
Peking University Third Hospital using standard clinical protocols as described’’. 
Embryos, at the 8-cell, morula and blastocyst stages were donated from couples who 
had undergone in vitro fertilization (IVF) treatments. These donor couples, whose 
infertility is purely due to female tubal factors, had a healthy baby through the IVF 
cycle already. They then donated the surplus frozen embryos for research with the 
written informed consents signed by them. 

The cumulus cells around the oocytes were removed by hyaluronidase (Sigma) 
treatment. Only the mature MII oocytes were used in our study. Embryos produced 
by intracytoplasmic sperm injection (ICSI) were cultured in G1.3 medium (Vitrolife, 
Sweden) droplets covered by mineral oil (Sigma) until day 2. Pronuclear, 2-cell and 
4-cell stage embryos were picked at the appropriate time’. The first and second 
polar bodies were collected by a micropipette with laser-assisted biopsy, at 2 and 
8h after ICSI, respectively (Extended Data Fig. 1a). 

Male and female pronuclei at different stages were collected from the zygotes at 
several time points according to time after ICSI precisely. Before isolation, zygotes 
were first immersed in G-MOPS (Vitrolife), supplemented with 5 ug ml | cyto- 
chalasin B (Sigma), and then stained with 5 jig ml~ ' Hoechst 33342 (Invitrogen) at 
37 °C for 10 min. Then the visible pronuclei were carefully removed by laser-assisted 
biopsy (HAMILTON ZILOS-tK, CRI). Single-cell reduced representation bisulphite 
sequencing (RRBS) libraries of male and female pronuclei were prepared following 
our previously published protocol step by step'’. Male and female pronuclei were 
further discriminated based on unsupervised hierarchical clustering of genome- 
wide methylation patterns of pronuclei and gametes (sperm and MII oocytes), and 
according to the methylation levels of the covered sperm-specific DMRs (fully meth- 
ylated in the sperm but unmethylated in the oocytes) and oocyte-specific DMRs (fully 
methylated in the oocytes but unmethylated in the sperm), respectively (Extended 
Data Fig. 5a, b). The results by these two analyses matched to each other consistently. 

The frozen day 3 cleavage-stage embryos were thawed by taking straws from the 
liquid nitrogen tank, followed by several steps to remove the cryoprotectant. Thawed 
embryos were transferred to G2 culture medium (Vitrolife) for further culture. The 
8-cell, morula and late blastocyst stage embryos were collected at 2h, 24h and72h 
after thawing, respectively. 

The selected embryos or oocytes were transferred into an acidic solution drop 
(1 ml of PBS supplemented with 1 1] of 36% HCl) to remove the zonapellucida. All 
of the embryos without zonapellucida were then washed several times by gentle 
repeated pipetting to eliminate any attached cumulus cell or polar bodies. The inner 
cell mass (ICM) and trophectoderm (TE) of late blastocysts were cut under a stereo- 
scope by mechanical dissection with a glass needle (Extended Data Fig. 1a). 

The post-implantation samples for RRBS are obtained from the patients with 
multifetal pregnancy between 6 and 10 weeks of gestation with signed informed 
consents from donors. Transvaginal-ultrasound-guided reduction by intrathoracic 
KCl injection was applied to get the 10-week-old embryo samples”*. The 6-week-old 
fresh and un-fragmented embryos for clean tissue isolation were expelled from the 
patients after first being treated with the mifepristone (75 mg per day for 2 days) 
and then 600 mg misoprastol. In order to obtain relatively pure tissues from these 
early embryos, the expulsed embryos were first washed carefully in PBS for several 
times to remove all of the potential maternal contaminants. The liver tissues were 
carefully dissected under the microscope. The isolated relatively pure tissues for the 
later whole genome bisulphite sequencing (WGBS) library construction were gently 
transferred into a new dish, and were washed for several times with PBS to remove 
any excessive blood and other possible contaminations. The gestational ages of the 
implanted embryos used in our study were assessed by the ultrasound result and 
the morphology and size of the embryos. 

Semen collection and preparation. Human semen samples were obtained from 
healthy donors by masturbation after an abstinence period of 2-3 days. After seminal 
liquefaction, the sample was transferred to a sterile 10 ml centrifuge tube, and washed 
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twice with HTF medium (Lifeglobal) supplemented with 10% human serum albu- 
min (HSA, Vitrolife). After washing, the supernatant is removed with a sterile pipette 
and 2 ml HTF medium is carefully added into the tube on the top of the pellet. After 
swimming-up, the sperm with vigorous activity separated from the leukocytes and 
dead sperm were collected for DNA extraction. 

Immunostaining of the oocytes and early embryos. Human zygotes and 2-cell 
embryos were harvested according to the time points precisely after ICSI. Human 
MII oocytes, zygotes and 2-cell embryos were first treated with acidic solution (pH 2.5, 
1 ml of PBS supplemented with 1 il of 36% HCl) to remove the zonapellucida. After 
thorough washing, the samples were fixed with 4% paraformaldehyde (Sigma) for 
30 min at room temperature and washed three times in PBS, supplemented with 
0.1% BSA (Sigma), and then followed by membrane permeabilization with 1% 
Triton X-100 (Sigma) for 2 h. For 5mC and 5hmC staining, the DNA of the oocytes 
and embryos were denatured with 4M HC] at room temperature for 20 min and 
subsequently neutralized by 100 mM Tris-HCl buffer (pH 8.0) for 20 min, with 0.1% 
BSA (Sigma). After washing, samples were blocked in blocking solution, which 
contains 3% BSA (Sigma) in PBS. After blocking at 4 °C overnight, samples were 
incubated with anti-5-methylcytosine antibody (1:50, BIMECY-0500; Eurogentec) 
or anti-5-hydroxymethylcytosine antibody (1:50, 39769; Active Motif) for 2h at 
37 °C. After washing three times, the samples were incubated with Alexa Fluor 568 
goat anti-mouse IgG (1:500, A-11004; Invitrogen) or donkey anti-rabbit IgG-FITC 
(1:100, sc-2012; Santa Cruz) for 2h at 37 °C. 

Since the asymmetrical distribution of histone lysine methylation is maintained 
in zygotes after sperm penetration, H3K9me3 was selected to mark the maternal 
chromatin and female pronuclei. Several of the zygotes were co-stained with H3K9me3 
(1:50, 07-442; Millipore) and 5mC antibody (1:50, BIMECY-0500; Eurogentec). 

The final 5mC, 5hmC and H3K9me3 signals were detected using confocal micros- 
copy (Carl Zeiss 710). All images were acquired and analysed using the ZEN imag- 
ing software (Carl Zeiss). 

We did an immunofluorescence experiment of 5mC and ShmC (Extended Data 

Fig. 6), which can clearly discriminate between 5hmC and 5mC. As a positive con- 
trol, mouse zygotes were also co-stained for 5mC and 5hmC following the same 
protocol and using the same batch of antibodies. From the immunostaining results 
of mouse zygotes, we found that the male pronuclei, especially in the late stage zygotes 
(18h post coitum), were stained with strong 5hmC signal, while the female pronuclei 
were mainly positive for 5mC, which is consistent with previous publications” *”. We 
then applied the same protocol to the human zygotes with minor modifications. 
The 5hmC and 5mC staining signals in human zygotes are more complex than in 
the mouse zygotes. It seems that both male and female pronuclei have 5hmC sig- 
nals in some zygotes. 
Purification of genomic DNAs and total RNAs. Genomic DNAs of the post- 
implantation embryos or tissues, or collected sperm cells, were extracted using the 
DNeasy Blood and Tissue kit (Qiagen) following the manufacturer’s instructions. 
The total RNAs of these post-implantation embryos or tissues were prepared using 
RNeasy Mini Kit (Qiagen) coupled with on-column DNA digestion following the 
manufacturer’s standard protocol. The qualities of total RNAs were determined by 
Agilent 2100 Bioanalyzer (Agilent) before the construction of the sequencing libraries. 
RRBS library preparation. All of the gametes and embryos, including MII oocytes, 
the first and second polar bodies, zygotes, 2-cell, 4-cell, and 8-cell embryos, morula, 
ICM, and trophectoderm cells (TE), were washed several times in PBS for the later 
RRBS libraries construction. Starting from limited numbers of embryos or single 
pronuclei, we integrated DNA extraction, MspI digestion, end-repair, dA-tailing, 
adaptor-ligation, and bisulphite-conversion into a one-tube reaction, to minimize 
the unnecessary DNA losses’*. And we spiked in 0.5% unmethylated lambda DNA 
(Fermentas) to the gDNA samples before MspI digestion for monitoring the bisul- 
phite conversion rate. After bisulphite treatment, the converted DNA was purified 
using regular zymo spin columns (Zymo Research) with 10 ng tRNA (Roche) as a 
protective carrier, followed by 2 rounds of PCR enrichments. After amplification, 
200-500 bp DNA fragments were size-selected and recovered after resolving on the 
12% native polyacrylamide TBE gel. 

RRBS libraries of human post-implantation embryos and sperm cells were con- 
structed following the standard RRBS protocol**. 100-bp pair-end sequencing was 
performed on Illumina HiSeq2000/HiSeq2500 platform. 

For RRBS from small number or bulk of cells (first polar bodies, second polar 
bodies, MII oocytes, sperm, zygotes, 2-cell, 4-cell, 8-cell embryos, morulae, ICM of 
blastocysts, TE of blastocysts, and post-implantation embryos), in total we analysed 
32 samples and generated 168.7 Gb sequencing data. For the single-cell RRBS, in 
total we analysed 35 samples (28 male and female single pronuclei from 14 zygotes 
at four different time points after ICSI, 3 single MII oocytes and 4 single sperm cells) 
and generated 116.4 Gb sequencing data. Note that the RRBS data set of the human 
ES cells was downloaded from Gene Expression Omnibus (accession number: 
GSM822615). Based on the RRBS analyses, we found that similar to that in mice, 
genome-wide demethylation happens after fertilization and the DNA methylation 
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level reaches the lowest level at the blastocyst stage in the ICM for both genic 
regions and transposable elements’ (Fig. 1b and Extended Data Fig. 3b). However, 
the major wave of demethylation is completed at the 2-cell stage and then the 
genome-wide DNA methylation remains relatively stable until the morula stage in 
humans. This is compatible with the possibility that after 2-cell stage, DNMT1 and 
UHRFI, which are responsible for maintenance of DNA methylation, may keep the 
DNA methylation stable during cell divisions in the cleavage stage embryos (Extended 
Data Fig. 10g). Since DNMT1 and UHRF1 are expressed at extremely high level in 
the 2-cell and 4-cell stage embryos, even if only a small fraction of these proteins is 
localized in the nucleus, it may be enough to maintain DNA methylation level, 
similar to the situation in mice**”. 

WGBS library preparation. The WGBS libraries were constructed following the 
standard protocol®”*. Briefly, genomic DNA with 0.5% unmethylated lambda DNA 
(Fermentas) spike-in was first sonicated into 200-300-bp fragments using Covaris 
S2 system. After being concentrated, the sheared DNA was end-repaired, dA-tailed 
and ligated with pre-methylated TruSeq DNA adapters (illumina). Bisulphite con- 
version was conducted with MethylCode Bisulphite Kit (Invitrogen), after bisulphite 
conversion, the converted templates were PCR amplified and quantified using Qubit 
ds DNA high sensitivity dye (Invitrogen), and standard-curve-based qPCR assay 
(Agilent). The final quality-insured libraries were sequenced on HiSeq2000 sequencer. 

For WGBS (ICM of blastocysts and post-implantation embryos), in total we ana- 
lysed 2 samples and generated 146.5 Gb of sequencing data. 

Both RRBS and WGBS techniques are based on bisulphite conversion of unmeth- 
ylated cytosine (C), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) into uracil 
(U), 5-formyluracil (5fU) and 5-carboxyluracil (5caU), whereas 5mC and 5hmC 
are kept intact. The methylated cytosines called by RRBS or WGBS are the sum of 
5mC and 5hmC, whereas the unmethylated cytosine called are the sum of unmodi- 
fied cytosine (C), 5fC and 5caC. Soa CpG site we called as demethylated means that 
the demethylation already goes beyond 5hmC stage (that is, unmodified C, 5£C, or 
5caC). 

Sequencing read quality control and alignment. All of the de-multiplexed sequenc- 
ing reads that passed filters were first trimmed to remove the low-quality bases and 
adaptor sequences. All of the RNA-seq reads were first aligned to the hg19 RefSeq 
reference genome (downloaded from the UCSC genome browser), and gene expres- 
sion profiles of each sample were calculated using the well-defined RPKM (reads 
per kilobase transcriptome per million reads) method to eliminate the effects of dif- 
ferent sequencing depths and large variation in lengths among different transcripts*”*. 
And we only used the expression profiles of known RefSeq genes for the subsequent 
analysis. As for the expression profiling of human retro-elements, we used BWA 
tools to align all of the filtered ‘clean’ data to the human genome assembly hg19 
(downloaded from the UCSC genome browser) with the command “aln -o 1 -e 60 
-i15-q 10 -t 8”, and used the number of mappable read counts normalized with the 
total number of mappable RefSeq reads as the expression profile of each annotated 
retro-element only if these mappable reads located in the corresponding element. 
As for the bisulphite sequencing, including WGBS and RRBS, the remaining paired- 
end reads were aligned to the human reference genome (hg19, downloaded from 
the UCSC genome browser) using Bismark tools (version 0.7.6) with default para- 
meters’’. Additionally, the 48,502 bp lambda DNA genome was rebuilt as an extra 
reference for later calculating the bisulphite conversion rate of each sample. 

SNP identification and allele assignment. We used the WGBS data of post- 
implantation stage to detect the ASM events. After the alignment of sequencing data 
to the hg19 reference genome, we used Bis-SNP tools” to call SNPs with likelihood 
ratio criteria between best and second best genotype no less than 20, mapping quality 
of mapped reads more than 30 and base quality for genotyping more than 5. As the 
version of Bismark alignment tool does not support gapped alignment, we used Indel 
realignment function of Bis-SNP program to improve the SNP calling accuracy. 
We further filtered SNPs followed the default pipeline in the user-guide of Bis-SNP 
using the “VCFpostprocess” function. SNPs that do not occur in dbSNP database 
and with less than five unique reads coverage were discarded. Next, we assigned 
sequencing reads to each allele based on heterozygous SNPs retained using a cus- 
tomized Perl script. 

Identification of the methylation levels of CpG and non-CpG sites. When we 
analysed the methylation level of each CpG site we covered, the following algorithm 
was applied: the number of reported C (‘methylated’ reads) divided by the total 
number of reported C (‘methylated’ reads) and T (‘unmethylated’ reads) at the same 
positions of the reference genome is calculated as the DNA methylation level of 
CpG site. Every CpG site with read depths = 1 were summed and counted as the 
total CpG coverage of the sample. But when we quantify the DNA methylation level 
of each sample, only the CpG sites with read coverage more than five times were 
taken into consideration, and we applied the 100-bp-tile-based DNA methylation 
calling algorithm’. First, we binned the genome into consecutive 100-bp tiles. The 
number of reported C, divided by the total number of reported C and T captured in 
the 100-bp tiles, is interpreted as the 100-bp-tile averaged DNA methylation level. 


The DNA methylation level of each sample is the average of the 100-bp tiles we 
have covered in our RRBS assay, while the DNA methylation level of each reported 
stage is the arithmetic average value of all biological replicates across each stage. 
The CpG density of every CpG site was calculated as the total numbers of all CpG 
dinucleotides located within 50 bp upstream and 50 bp downstream of this CpG 
site, whereas the CpG density of every 100-bp tile was then calculated as the aver- 
aged CpG density of all CpG sites located in this 100-bp tile. 

For non-CpG methylation, the same strategy was applied unless indicated other- 
wise. As the non-CpG methylation levels are much lower than the CpG methylation 
levels, we used the single base resolution data, instead of the sliding-window-based 
approach to calculate the non-CpG methylation level, and we just average the 
methylation levels of all the non-CpG sites in each stage, only if they are covered 
more than five times. 

Identification of ASM. As CpG methylation events always happen symmetric- 
ally, the bisulphite-converted reads from the forward and reverse strands covering 
the same symmetric CG site were combined for analysis. Only the CpG dinucleotides 
both covered by at least three reads on each allele (no matter the reads originated 
from the forward or reverse strands) were selected for the further analysis. CpG 
sites with ASM were identified if these CpG sites show methylation levels higher 
than 75% in one allele, and lower than 25% in the other allele. And the allelic read- 
pairs (sharing the same SNP information) harbouring at least one ASM CpG site 
were further extracted to calculate the methylation levels of the ASM regions. ASM 
regions were defined for each heterozygous SNP with at least three ASM CpG sites 
identified using this SNP and the methylation level of this region is higher than 
75% in one allele and less than 25% in the other allele. 

Identification of dynamically methylated tiles and gamete specific differentially 
methylated regions (DMRs). After quantifying the 100-bp-tile DNA methylation 
levels using 100-bp-tile-based methylation calling algorithm, we systematically 
compared the DNA methylation levels of 100-bp tiles which were covered in both 
MII oocytes and sperm. We assigned these 100-bp tiles as gamete-specific DMRs 
only if the methylation level of these tiles is in one type of gametes greater than 75%, 
while in the other type of gametes less than 25%, with a significant P = 0.05 given by 
multiple Student’s t-test and a Benjamini-Hochberg false discovery rate (FDR) = 0.05. 
Additionally, if the methylation level differences of 100-bp tiles in each pair of con- 
secutive stages exceed 40%, with significance in an FDR-corrected Fisher’s exact 
test (P = 0.05, Benjamini-Hochberg FDR = 0.05), these tiles were classified as the 
changing tiles, while the remaining tiles were considered as the stable tiles. The 
dynamically methylated tiles with DNA methylation levels increased or decreased 
across pairwise consecutive stages were assigned as increasing or decreasing tiles, 
respectively. Other tiles with DNA methylation levels more than 75% or less than 
25% in both sperm and oocytes were assigned as the hypermethylated or hypo- 
methylated tiles in both gametes, respectively. The hypergeometric probability dis- 
tribution model was then applied to analyse which genomic region enriched most 
significantly in these tiles accordingly (Extended Data Fig. 7c). 

We systematically searched and identified the transient methylation imprinting 
regions in our RRBS data set, which showed different methylation levels in gametes, 
and kept about 50% methylation from 2-cell embryos to blastocysts. Among the 
gamete-specific DMRs, especially in 12,145 oocyte-specific DMRs, about 1,400 DMRs 
showed transiently imprinted till blastocyst stage. On the contrary, only 140 out of 
the 17,473 sperm-specific DMRs are transiently imprinted. Most of these transient 
imprints gained further methylation after implantation (Extended Data Fig. 7d, e). 
Human genomic region annotations and calculation of their DNA methyla- 
tion levels. Human promoters were first classified into three classes according to 
their CpG densities and observed to expected CpG ratios. High-density CpG pro- 
moter (HCP), intermediate-density CpG promoter (ICP), and low-density CpG 
promoter (LCP) annotations were annotated as previously published” and then 
converted to hg19 reference build with UCSC liftOver tool. All of the enhancers we 
used in this study were identified by making use of published data sets'”"*. We integrated 
all of the candidate enhancers by calling the peaks of H3K4mel in human ES cells 
(GSM466739) and 12 other types of human tissues’” (adipose nuclei (GSM772757), 
adult kidney (GSM773001), adult liver (GSM537706), duodenum mucosa (GSM 
621403), duodenum smooth muscle (GSM772837), pancreatic islet (GSM537642), 
skeletal muscle (GSM621640), stomach smooth muscle (GSM621642), fetal brain 
(GSM706850), fetal heart (GSM706848), fetal kidney (GSM621409), and fetal lung 
(GSM706853)), and then filtered out the regions which overlapped with the poten- 
tial promoters (—1kb to +0.5 kb around TSS or H3K4me3 peak regions). These 
putative enhancers, either poised or active, were selected for the further study. Other 
chromatin modification information of human ES cells, such as H3K27me3 (GSM 
733748), H3K9me3 (GSM1003585), H3K4me3 (GSM733657), were downloaded 
from the ChIP-seq data set of the ENCODE database directly'®. The annotated retro- 
elements, such as LINEs, SINEs and LTRs and their subfamilies, were downloaded 
from the RepeatMasker track of the UCSC genome browser. Other regions, such as 
CGls, exons and introns were downloaded from UCSC tables with hg19 track. As 
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previously reported, intragenic regions were covered from TSS to TES, while the 
intergenic regions were defined as the complement of intragenic regions in the human 
genome. For each annotated genomic region, the DNA methylation level was cal- 
culated as the average DNA methylation levels of all CpG sites covered within the 
region with more than fivefold coverage. Additionally, when quantifying the meth- 
ylation level of each gene promoter region, only those promoters with at least 5 CpG 
sites covered were retained for further analysis. 

We downloaded the previously published bisulphite sequencing data of the human 
ES cells (Gene Expression Omnibus accession number: GSM822615), and quantified 
the methylation level using the same 100 bp sliding-window-based method. The 
methylation levels of the derived human ES cells are much higher than that in the 
ICM (Extended Data Fig. 9f). This is compatible with the possibility that human ES 
cells are more similar to post-implantation epiblast cells (similar to the mouse EpiSCs, 
but relatively different from mouse ESCs), although human ES cells are derived from 
the ICM of blastocysts. This is also compatible with our previous finding that the 
transcriptome of human ES cells are similar but distinct from that of the ICM (with 
about 1,500 genes showing differential expression between the ICM and human ES 
cells)”, 

For repeat elements, LIPA, L1PB, LIMA, L1MB, L1MC, L1MD and L1ME are 
the subfamilies within the same L1 family, with L1PA evolutionarily youngest, and 
L1MC, LIMD, L1ME oldest®. We found that the residual DNA methylation level 
of L1PA is in general higher than that of LIMC, LIMD and L1ME. So it supports 
our conclusion that the evolutionarily younger LINEs are demethylated to a milder 
extent compared to older ones in the same family and have higher abundance of 
transcripts, implying that the early embryos tend to keep higher residual methyla- 
tion at the evolutionarily younger and maybe more active transposable elements. 
For the subfamily of Alu, the residual methylation level of each subfamily is com- 
parable, which is either due to the difference being too subtle to be detected under 
current measurement accuracy, or there is no difference between the subfamilies. 
The integrated analysis of gene expression and DNA methylation. The log, of 
the gene expression levels (RPKM) of RefSeq genes were first calculated, and genes 
with RPKM less than 0.001 were reset to 0.001. As for the DNA methylation levels, 
we only displayed the methylation level of promoters of each expressed gene. The 
Pearson correlation coefficients (r) between gene expression and DNA methylation 
levels of promoters of corresponding genes were calculated using a customized Perl 
script (Fig. 3c). The gene ontology (GO) analysis was done for the following genes: 
(1) the expression level decreased more than twofold from ICM to post-implantation 
stages (t-test; P = 0.05); (2) the absolute DNA methylation level of their promoters 
increased more than 20% (that is, level of 5mC in post-implantation —level of 5mC 
in ICM = 20%) (Fig. 3d). The GO analysis was performed with DAVID online 
(http://david.abcc.ncifcrf.gov/)*. 
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The integrated analysis of signal intensities of histone modifications and DNA 
methylation. We took advantage of the full ChIP-seq data set in the ENCODE data- 
base of the histone modifications in human ES cells (H3K27me3 (GSM733748), 
H3K9me3 (GSM1003585), H3K4me3 (GSM733657)), and made use of the signal 
intensity of each chromatin modification of each well-defined peak (StdSig.bigwig 
and broadPeak.bed files)'”'*”°. We simply summed the DNA methylation level of 
every CpG site (read depth = 5) divided by the total number of the CpG sites located 
in every called histone modification peak as the DNA methylation level of every 
histone modification peak. We plotted the Pearson correlation coefficients (r) between 
peak signal intensity of each type of histone modification and DNA methylation 
level of each significant peak across every developmental stage using a customized 
Perl script (Extended Data Fig. 9a-c). 
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Extended Data Figure 1 | DNA methylation dynamics during human early 
embryonic development. a, Morphology of human early embryos used in this 
study. The microscopy images of laser assisted biopsy of polar bodies (PB), 
mature oocytes, and the first polar bodies, zygotes, 2-, 4-, 8-cell stage embryos, 
morula, inner cell mass (ICM) and trophectoderm cells (TE) from blastocyst 
stage embryos. Notably, the zonapellucida of all embryos, as well as mature 
oocytes, is removed to avoid any possible contaminants. Scale bar, 50 jum. 

b, Pearson correlation heat map of DNA methylomes at different 
developmental stages of human early embryos. The numbers in the sample 
names indicate different biological replicates of the same developmental stages. 
The colour key from green to red indicates the correlation coefficient from low 
to high, respectively. c, The averaged non-CpG DNA methylation levels of MII 
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oocytes, the first (1st PB) and second (2nd PB) polar bodies, along the gene 
bodies and 15 kb upstream of the transcription start sites (TSS) and 15 kb 
downstream of the transcription end sites (TES) of all RefSeq genes. 

d, Non-CpG methylation levels across different human early embryonic stages. 
The red bars indicate the averaged DNA methylation levels of the non-CpG 
sites (CHG and CHH), while the grey bars indicate the bisulphite non- 
conversion rate of corresponding samples. All data are mean + 95% confidence 
interval (+ 1.96 s.e.m.). Details of biological replicates of each stage are listed in 
Supplementary Table 1. e, The positive correlation of non-CpG methylation 
levels of the gene bodies and the expression levels of corresponding genes in 
MII oocytes, R indicates the Pearson correlation coefficient. 
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Extended Data Figure 2 | The general characteristics of the DNA levels during human early embryonic development. c, Histogram of the 
methylation patterns during human early embryonic development. a, The numbers of DNA tiles showing increasing (magenta) and decreasing (cyan) 
heat map view of a representative section of chromosome 1 showing the levels of DNA methylation between consecutive stages of human early 
dynamics of DNA methylomes across different developmental stages. The embryos. d, Histogram of the fractions of tiles with 0-20%, 20-40%, 40-60%, 
green bars in the right panel indicate the average methylation levels of the 60-80% and 80-100% methylation levels across different developmental stages. 


corresponding regions. And the window size for the DNA methylation level e, The distribution of high/intermediate (methylation level = 0.2) and low 
calculation and presentation of these green bars is the single CpG dinucleotide (methylation level < 0.2) methylation tiles at each developmental stage against 
covered (at least five times) in these corresponding regions. b, Histogram ofthe | CpG density. f, Histogram of the counts of 100 bp tiles with different 
numbers of changing (royal blue) and stable (sky blue) tiles between methylation levels. n means the total number of the 100 bp tiles for each stage. 
consecutive stages, which shows the major transitions in DNA methylation g, Box plots of methylation levels of each stage across local CpG densities. 
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Extended Data Figure 3 | The dynamic changes of DNA methylation ona —_ methylation levels of the corresponding regions in sperm, while the purple dot 
variety of annotated genomic regions. a, Histograms of the numbers of tiles at gamete stage indicates those in MII oocytes. HCP, high-density CpG 

with increasing (magenta) and decreasing (cyan) DNA methylation between —_ promoter; ICP, intermediate-density CpG promoter; LCP, low-density CpG 
each pair of consecutive stages in the annotated genomic regions duringhuman _ promoter; annotations as previously published’. All data are mean + 95% 
early embryonic development. b, The line charts of the average methylation confidence interval (+ 1.96 s.e.m.). Details of biological replicates are listed in 
levels of annotated genomic regions during human early embryonic Supplementary Table 1. 

development. The green dot at gamete stage indicates the average DNA 
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Extended Data Figure 4 | DNA methylation patterns in oocytes, polar 
bodies, sperm, blastocysts and post-implantation embryos. a, Histograms of 
the averaged DNA methylation levels of sperm (n = 4), ICM of blastocysts 

(n = 3), and post-implantation embryos (liver, n = 3) covered by both RRBS 
and WGBS data sets (WGBS data set of sperm was downloaded from Molaro, 
A. et al.”°). b, Comparison of the averaged DNA methylation levels along the 
gene bodies and 15 kb upstream of the transcription start sites (TSS) and 15 kb 
downstream of the transcription end sites (TES) of all RefSeq genes, 
respectively. It was analysed by WGBS for sperm (yellow line, WGBS data set 
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of sperm was downloaded from Molaro, A. et al.*°), ICM of blastocysts 
(purple line) and post-implantation embryos (liver; black line). c, Histograms 
of the average methylation levels in different genomic regions in MII oocytes 
(n = 2) as wellas the first and second polar bodies (the first polar bodies, n = 2; 
the second polar bodies, n = 2). d, Histograms of the average methylation 
levels in different genomic regions in ICM (n = 3) and TE (n = 3) isolated from 
the late blastocysts. All data in panel a, c and d are mean + 95% confidence 
interval (+ 1.96 s.e.m.). 
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Extended Data Figure 5 | The demethylation patterns of maternal and 
paternal genomes on a variety of annotated genomic regions. a, Pearson 
correlation heat map of DNA methylomes of individual male and female 
pronuclei as well as the gametes. The colour key from green to red indicates the 
correlation coefficient from low to high, respectively. The unsupervised 
clustering result shows that the single male pronuclei and sperm clustered 
together, while the single female pronuclei and MII oocytes clustered together. 
b, Discrimination of individual male and female pronuclei by analysing MII 


oocyte-specific and sperm-specific DMRs. The sperm-specific DMRs (17,096 
100 bp tiles) and MII oocyte-specific DMRs (11,850 100 bp tiles) covered by 
single-cell RRBS data set were used as the criterion for judging individual male 
and female pronuclei isolated from the same zygotes. The colour key from 
green to red indicates the DNA methylation levels from low to high, 
respectively. c, Demethylation dynamics of maternal and paternal genomes in 
human zygotes analysed by single pronucleus RRBS analysis in different 
annotated genomic regions. 
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Extended Data Figure 6 | DNA demethylation patterns in pronuclear stage 
embryos analysed by immunostaining. a-f, The immunostaining of 5mC, 
5hmC and H3K9me3 in human oocytes (a), zygotes (2PN) (b-e) and 2-cell 
embryos (f). The green and red signals identified from the staining indicate the 
5hmC and 5mC modifications, respectively (a, c-f). Male and female symbols 
in the merged panels indicate the male and female pronuclei. The white triangle 
symbols indicate the polar bodies. b, The immunostaining of 5mC and 
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H3K9me3 in human zygotes. Human zygotes were co-stained with 5mC (red) 
and H3K9me3 (green, pronuclei with intense H3K9me3 signals were female 
pronuclei). g, The immunostaining of 5mC and 5hmC in mouse zygotes and 
2-cell embryos as controls. The green and red signals identified from the 
staining indicate the 5hmC and 5mC modifications, respectively. Male and 
female symbols in the merged panels indicate the male and female pronucleus. 
The white triangle symbols indicate the polar bodies. 
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Extended Data Figure 7 | DNA methylation changes of DMR regions and —_ hypomethylated tiles in both gametes, exhibited the strong enrichment for 
non-DMkR regions of the human gametes during pre- and post- different genomic regions (hypergometric enrichment test). d, Box plot of 
implantation embryonic development. a, Box plots of DNA methylation methylation levels of oocyte-specific DMRs across different developmental 
levels for hypermethylated 100 bp tiles (average methylation levels = 75%) in _ stages. e, Box plot of methylation levels of sperm-specific DMRs across different 
both gametes across early embryonic development stages. b, Box plots of DNA developmental stages. f, The bar plot showing the numbers of gamete-specific 
methylation levels for hypomethylated 100 bp tiles (average methylation levels | DMRs located in different genomic regions, which indicates the strong 

= 25%) in both gametes across early embryonic development stages. c, The enrichment for different regions in sperm-specific DMRs (blue bars) and 
hypergeometric enrichment analysis of the hypermethylated and oocyte-specific DMRs (red bars). 
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Extended Data Figure 8 | The DNA methylation patterns of imprinting imprinting region within the gene body of IFI6, covered in our RRBS data set, 
genes and ASM regions during human early embryonic development. which was fully methylated in MII oocytes, unmethylated in sperm, around 
a, A representative locus of a known paternal imprinting gene, H19, coveredin 50% methylated in cleavage embryos and post-implantation embryos. c, d, Two 
our RRBS data set. The blue bars indicate the DNA methylation levels of allele-specific methylation loci on chromosome 16 (c) and chromosome 
different CpG sites. The region was unmethylated in MII oocytes, fully 2 (d), tracked with heterozygous SNPs to distinguish their allele origins. 
methylated in sperm cells and around 50% methylated in cleavageembryosand _‘The paired reads generated from the WGBS data sets with heterozygous SNPs 
post-implantation embryos. b, A representative locus of a potential novel were selected to show the DNA methylation levels of the two alleles. 
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Extended Data Figure 9 | The relationship of DNA methylation, histone 
modification and RNA expression during human early embryonic 
development. a-c, The correlation between signal intensities of three types of 
histone marks (H3K27me3, in panel a; H3K9me3, in panel b and H3K4me3, 
in panel c) and the DNA methylation levels of corresponding peak regions 
during human early embryonic development. The horizontal axis from left to 
right of each panel represents the peak regions of histone modifications, ranked 
by their signal intensities from high to low. d, The scatter plot of DNA 
methylation levels of promoter regions (HCP, ICP and LCP) and the relative 
expression levels of corresponding RefSeq genes. Log, values of the gene 
expression levels (RPKM) are given. The Pearson correlation coefficients (r) 
between DNA methylation levels of promoter regions and the scaled expression 
levels of the corresponding genes across different early embryonic stages 
were calculated and are shown in the top right corner of each panel. The red and 
blue fitting curves represent gene expression levels and DNA methylation levels 
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of corresponding promoter regions, respectively. The genes were arranged 
according to their expression levels. e, The scatter plot of DNA methylation 
levels of gene bodies and the relative expression levels of corresponding RefSeq 
genes. Log, values of the gene expression levels (RPKM) are given. The Pearson 
correlation coefficients (r) between DNA methylation levels of gene body 
regions (blue lines) and the scaled expression levels (red lines) of the 
corresponding genes across different early embryonic stages were calculated 
and are included in the top right corner of each panel. The red and blue fitting 
curves represent gene expression levels and DNA methylation levels in 
corresponding gene body regions, respectively. The genes were arranged 
according to their expression levels. f, Histograms of the average DNA 
methylation levels in different genomic regions between ICM replicates (n = 3) 
and human ES cells (GSM822615, n = 2) by RRBS, which showed generally 
higher methylation levels in human ES cells than the ICM of the blastocysts. 
All data are mean + 95% confidence interval (+ 1.96 s.e.m.). 
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Extended Data Figure 10 | The relationship of DNA methylation and RNA 
expression of transposable elements during human pre- and post- 
implantation embryonic development. a, The line chart of the relative 
expression levels (sequencing read counts, normalized by total mappable 
RefSeq read counts) of SVAs. Notably, the expression levels of SVAs increased 
dramatically from 4-cell stage to morula stage. Biological replicates in panel 
a: MII oocyte (n = 3), zygote (n = 3), 2-cell (n = 6), 4-cell (n = 12), 8-cell 

(n = 20), morula (nm = 14), ICM ( = 10), post-implantation (n = 3). b, The 
average DNA methylation levels of SVAs. The green dot in gamete stage 
indicates the average DNA methylation level of the corresponding regions in 
sperm, while the purple dot in gamete stage indicates that in MII oocytes, 
respectively. Details of biological replicates of each stage are listed in 
Supplementary Table 1. c, The line chart of the relative expression levels 
(sequencing read counts, normalized by total mappable RefSeq read counts) of 
four major subfamilies (ERV1, ERVK, ERVL and ERVL-MaLR) of LTRs during 
early embryonic development. Biological replicates in panel c: MII oocyte 

(n = 3), zygote (n = 3), 2-cell (n = 6), 4-cell (n = 12), 8-cell (n = 20), morula 
(n = 14), ICM (n = 10), post-implantation (n = 3). d, The average DNA 
methylation levels of four major subfamilies of LTRs during early embryonic 
development. The green dot in gamete stage indicates the average DNA 
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methylation level of the corresponding regions in sperm, while the purple dot in 
gamete stage indicates that in MII oocytes. Details of biological replicates of 
each stage are listed in Supplementary Table 1. e, DNA methylation levels of the 
subfamilies of Alu, including AluY (the evolutionarily youngest one, the left 
panel), AluS (the middle panel) and AluJ (the evolutionarily oldest one, the 
right panel). Details of biological replicates of each stage are listed in 
Supplementary Table 1. f, DNA methylation levels of the subfamilies of L1, 
including L1PA (the evolutionarily youngest one in L1 family), LIPB, LIMA, 
L1MB, L1MC, L1MD and L1ME (the evolutionarily oldest one in L1 family). 
The green and red dots represented sperm and MII oocytes, respectively. 
Details of biological replicates of each stage are listed in Supplementary Table 1. 
g, Histograms of expression levels (RPKM) of DNA methylation-related 
genes across different human early embryonic stages, including DNA- 
demethylation-related genes TET1,TET2,TET3 and TDG, as well as 
DNA-methylation-related genes DNMT1, UHRF1, DNMT3A, DNMT3B and 
DNMT3L. Biological replicates in panel g: MII oocyte (n = 3), zygote (n = 3), 
2-cell (n = 6), 4-cell (n = 12), 8-cell (n = 20), morula (n = 16), blastocyst 

(n = 30). All data in panel a-g are mean + 95% confidence interval 

(+ 1.96 s.e.m.). 
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DNA methylation dynamics of the human 


preimplantation embryo 


Zachary D. Smith'”*"**, Michelle M. Chan'*, Kathryn C. Humm*®”"**, Rahul Karnik'?*, Shila Mekhoubad*"*, Aviv Regev?°, 


Kevin Eggan?+"" & Alexander Meissner"? 


In mammals, cytosine methylation is predominantly restricted to 
CpG dinucleotides and stably distributed across the genome, with 
local, cell-type-specific regulation directed by DNA binding factors’. 
This comparatively static landscape is in marked contrast with the 
events of fertilization, during which the paternal genome is globally 
reprogrammed. Paternal genome demethylation includes the majority 
of CpGs, although methylation remains detectable at several nota- 
ble features*’. These dynamics have been extensively characterized 
in the mouse, with only limited observations available in other mam- 
mals, and direct measurements are required to understand the extent 
to which early embryonic landscapes are conserved* '°. We present 
genome-scale DNA methylation maps of human preimplantation 
development and embryonic stem cell derivation, confirming a tran- 
sient state of global hypomethylation that includes most CpGs, while 
sites of residual maintenance are primarily restricted to gene bodies. 
Although most features share similar dynamics to those in mouse, 
maternally contributed methylation is divergently targeted to species- 
specific sets of CpG island promoters that extend beyond known 
imprint control regions. Retrotransposon regulation is also highly 
diverse, and transitions from maternally to embryonically expressed 
elements. Together, our data confirm that paternal genome demeth- 
ylation is a general attribute of early mammalian development that 
is characterized by distinct modes of epigenetic regulation. 

We generated genome-scale methylation maps of human preimplan- 
tation development using reduced representation bisulphite sequenc- 
ing (RRBS) to accommodate minimal DNA inputs’. We thawed and 
screened morphologically normal cleavage stage embryos and blasto- 
cysts to represent early and late preimplantation, two replicates of pooled, 
matched inner cell mass (ICM) and trophectoderm, motile sperm from 
four unrelated healthy donors, and fetal tissues (Extended Data Fig. 1). 
To estimate the time, extent and targets of global remethylation, we 
generated derivation time series of three human embryonic stem (ES) 
cell lines, collecting the primary outgrowth, first and fifth passage per 
line. On average, replicates showed high reproducibility and captured 
1,753,958 CpGs of methylation data at 10x coverage (Extended Data 
Fig. 2). 

We noted two distinguishable architectures for DNA methylation 
across this time series: somatic-like CpG-density-dependent bimodality 
in sperm, ES cells and fetal tissues, and extensive CpG-density-independent 
hypomethylation in preimplantation embryos (Fig. 1a). The substan- 
tial intermediate methylation in sperm reflects disparate repetitive ele- 
ment regulation, though non-repetitive sequences still fit the somatic 
paradigm"! (Fig. 1b). Almost no hypermethylated CpGs persist into 
cleavage, with residual methylation diminishing further into the blas- 
tocyst, indicating that the embryonic landscape is rapidly established 
before the third embryonic division (Fig. 1b and Extended Data Fig. 2f). 


Without access to human epiblasts, in vivo characterization of global 
remethylation is unavailable, but cross-species comparison between mouse 
epiblast and human ES cells suggest that they are a reasonably proxy for 
postimplantation pluripotency (Extended Data Fig. 3). Notably, within 
primary ES cell outgrowths, global remethylation is nearly complete, 
including for intermediately methylated repetitive elements apparent 
in sperm (Fig. 1b). 

Despite predominant hypomethylation, erasure is not the de facto 
fate of all loci. Non-repetitive 100-bp genomic tiles were clustered using 
k-means into 10 dynamic patterns. Of sperm hypermethylated tiles, 45% 
retain some methylation over preimplantation, and 23% display high 
enough levels in cleavage embryos to be biparentally inherited and at 
least partially maintained (Fig. 1c). Local maintenance is significantly 
weighted to gene bodies: only 31% of 40,486 sperm hypermethylated 
intergenic tiles are = 0.2 methylated in cleavage embryos, compared to 
57% or 59% of exons or introns, respectively (Fig. 1d, e). Frequently, 
gene body methylation extends for tens to hundreds of kilobases within 
a single gene (Fig. 1f). Sites of retained embryonic methylation suggest 
residual DNA methyltransferase activity within a phase where main- 
tenance appears otherwise impeded”. 

We incorporated recently published RNA-sequencing (RNA-seq) data 
to interpret the relationship between DNA methylation and expression” 
(Supplementary Table 1). Despite global hypomethylation, the canonical 
negative correlation between promoter methylation and gene expres- 
sion extends to preimplantation, although the overall range in promoter 
methylation is contracted (Extended Data Fig. 4a). Few demethylated 
promoters are transcribed, suggesting that promoter demethylation 
largely reflects the global trend (Extended Data Fig. 4b). However, deme- 
thylated promoters are more frequently induced than repressed and include 
POUSFI (also known as OCT4), whose embryonic induction is essen- 
tial for development"* (Extended Data Fig. 4c, d). Thus, our preimplan- 
tation data support models where distinct mechanisms may regulate 
global versus targeted reprogramming”. 

The global DNA methylation dynamics of the human embryo closely 
mirror those of the mouse, with sharp transitions both into and out of 
preimplantation”'® (Fig. 2a). We investigated the behaviour of ortho- 
logous exons within the mouse preimplantation timeline’, predicated 
on their human dynamics. Surprisingly, simply sorting mouse exons 
according to the dynamics of their human orthologues recapitulated 
similar trends in sperm and over preimplantation (Fig. 2b). In mouse, 
demethylated exons are erased early, with moderate passive depletion 
over cleavage (Fig. 2c, d). Alternatively, exons that maintain methyla- 
tion behave similarly in mouse and are hypermethylated in both gam- 
etes (Fig. 2c, d). Intron dynamics revealed comparable trends, but only 
after repetitive elements were removed from methylation estimates 
(Extended Data Fig. 5). Thus, both species pass through an equivalent 
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global reprogramming with congruent kinetics, while many orthologous 
regions maintain methylation that decays passively. 

Transient, maternally inherited monoallelic methylation has been 
previously observed in mouse*’. To identify candidate loci in human 
without access to oocytes, we searched for regions that are significantly 
more methylated in preimplantation embryos than in sperm, as this meth- 
ylation would be likely to be of maternal origin (Extended Data Fig. 6a 
and Methods). Using our criteria, we identify 5,265 100-bp candidate 
maternal differentially methylated regions (DMRs), including most canon- 
ical ICRs. We clustered these regions by their resolution in ES cells and 
found that the majority are preimplantation-specific and either hyper- 
or hypomethylated in somatic tissue (Fig. 3a). The location and CpG 
density of DMR tiles depends on their resolution, with somatically hypo- 
methylated DMRs substantially enriched for CpG island (CGI)-containing 
promoters, whereas somatically hypermethylated DMRs are more likely 
to be intragenic and distributed further downstream (Fig. 3a and Extended 
Data Fig. 6b, c). To confirm that these signatures represent true imprint- 
like maintenance, we generated RRBS libraries of two unrelated, single 
blastocysts and identified CpGs that could be assigned to each allele, 
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Figure 1 | Human preimplantation embryos are globally hypomethylated. 
a, Top, DNA methylation across 100-bp tiles for human sperm, 
preimplantation embryos, including the ICM and trophectoderm (TE), 

ES cell derivation from outgrowth to fifth passage (p5), and somatic fetal tissues 
representing all germ layers. Grey highlights the average. n equals the average 
number of tiles captured for a given stage. The y axis shows the number of 
tiles in each methylation bin. Bottom, boxplots of methylation at different local 
CpG densities (y axis). Bullseyes indicate the median, boxes and lines the 
25th and 75th, and 2.5th and 97.5th percentiles, respectively. b, Bar plots of 
100-bp tiles segregated by non-repetitive (unique) or repetitive designation and 
binned by methylation status. Bl, blastocyst; Cl, cleavage; Som, somatic; 

Sp, sperm. ‘ES cell’ and ‘Som’ show the average of these time points. 

c, Non-repetitive 100-bp tiles are clustered via k-means into 10 dynamics. 
Sperm hypermethylated sequences follow three general trajectories: persistent 
maintenance, incomplete or complete demethylation. Other dynamics include 
sperm specific hypermethylation and hypomethylation shared by sperm 

and the early embryo followed by de novo methylation in ES cells. Finally, 3,586 
tiles are hypomethylated in sperm and ES cells but methylated in embryos, 
representing transient imprint-like signatures. n equals the number of tiles 
shared between the timepoints and used for clustering. d, Dynamics for 
sperm hypermethylated, non-repetitive tiles as clustered in c. Left heatmap, 
per-cluster average of tiles. Right heatmap, —log)) P value of hypergeometric 
enrichment for each cluster for intergenic, exonic, intronic, CGI or TSS 
annotations using sperm hypermethylated regions as the background. e, Violin 
plot for sperm hypermethylated intergenic (Inter), exonic and intronic features. 
f, The OBSCN gene exhibits high inter- and intra-genic methylation and an 
unmethylated promoter in sperm and ES cells. In cleavage embryos, a 130-kb 
region, highlighted in blue, remains specifically methylated while the periphery 
is demethylated. Each dot refers to a CpG captured by RRBS. The y axis for 
DNA methylation represents the frequency in which captured CpGs are 
methylated and ranges from 0 to 1. 


nearly all of which were monoallelically methylated (Fig. 3b and Extended 
Data Fig. 7). 

Given that maternal DMRs in mouse are also enriched for CGIs, we 
next examined the conservation of targeted loci between species (Fig. 3c). 
We found that 795 and 293 CGIs behave as transient DMRs in human 
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Figure 2 | Human preimplantation dynamics are globally similar to mouse. 
a, Histograms of methylation changes (A methylation) for 100-bp tiles across 
human and mouse preimplantation from fertilization (Sp to Cl) through 
preimplantation (Cl to Bl) to global remethylation at implantation, as measured 
from blastocyst to ES cell in human and ICM to embryonic day 6.5 (E6.5) 
epiblast in mouse (BI to ES cell/Epi). n shows the number of tiles contributing to 
each comparison. b, Exons clustered by dynamics in human with equivalent 
methylation values for orthologous sequences in mouse. The A methylation 
heatmap displays the difference in methylation for matched time points. 

n shows the number of exons available for cross species comparison. c, Violin 
plots of orthologous human sperm hypermethylated exons classified as 
maintained versus demethylated (top two clusters in b) and measured over 
human and mouse preimplantation. 
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Figure 3 | Transient maternal DMRs target a divergent set of CpG island 
promoters. a, Heatmap of 5,265 100-bp tiles consistent with maternally 
contributed monoallelic methylation (Methods). Tiles are partitioned 
according to their hypomethylated (0.2), intermediate or variable (>0.2, 
<0.8) or hypermethylated (=0.8) resolution in ES cells. Feature annotations 
are included as separate heatmaps. b, Boxplots of CGI DMR methylation for 
two independent single blastocysts, with heterozygous SNP-linked CpGs 
highlighted. Within each replicate, 31 and 33 CGI DMRs contain CpGs that 
could be assigned to parental loci. In each case, DNA methylation is restricted 
to only one of the two alleles. Untracked refers to the inferred methylation 
status before haplotype segregation. Red line signifies the median, boxes and 
whiskers the 25th and 75th, and 2.5th and 97.5th percentiles. c, Single CpG 
resolution methylation values of a conserved preimplantation-specific DMR in 
human and mouse. Human blastocyst data includes information from the 
pooled sample as well as for a single blastocyst replicate (purple) with 
allele-tracked methylation for 10 CpGs highlighted in pink and blue. Annotated 
CGls are included below. d, Resolution of CGIs that behave as maternal DMRs 
in human and mouse. e, Heatmap of orthologous ICRs over human and mouse 
preimplantation development. f, Orthologous hypomethylation-resolving 
CGI DMRs in human and mouse share only 13 equivalently regulated regions. 
When methylation values of mouse or human specific DMRs are tracked in the 
alternate species, they are constitutively hypomethylated, indicating that 
oogenesis targets equivalent genomic features but at species-specific sequences. 
Prelmp refers to the average value for cleavage and blastocyst in human or 8 cell 
and ICM in mouse. 


and mouse, respectively, with substantially more resolving to hyper- 
methylation in human (Fig. 3d and Supplementary Table 2). Notably, 
human DMBRs resolving to hypomethylation are more likely to be anno- 
tated as CGIs in mouse than those resolving to hypermethylation (Extended 
Data Fig. 6d, e). We restricted our comparison to DMRs that share CGI 
status and somatic hypomethylation in both species and found that 
maternally contributed, preimplantation-specific DMRs are strikingly 
divergent, with only 7.5% found in human equivalently regulated in mouse. 
No obvious trend distinguished shared from species-specific signatures, 
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though several, such as the somatic promoter of DNMT1, indicate con- 
served regulatory utility’ (Extended Data Fig. 8a). The disparity of maternal 
methylation targeting contrasts true ICRs, which are generally conserved'* 
(Fig. 3e and Extended Data Fig. 8b). Moreover, hypomethylation-resolving 
DMRs specific to one species are constitutively hypomethylated in the 
other, suggesting that these signatures are frequently and diversely tar- 
geted (Fig. 3f). 

We next investigated species-specific repetitive elements, incorporating 
RNA-seq data to interpret DNA methylation’s role in their regulation’’. 
In human sperm, repetitive elements are frequently incompletely meth- 
ylated or hypomethylated’’. Long-terminal-repeat-containing elements 
(LTRs) are unexpectedly bimodal, with only a fraction hypermethylated 
and most displaying gametic escape that persists over preimplantation 
(Extended Data Fig. 9a). Long interspersed nuclear elements (LINEs) are 
generally highly methylated in sperm, demethylated in the early embryo, 
with some partial remethylation in human ES cells and complete hyper- 
methylation in somatic cells (Extended Data Fig. 9b). Finally, short inter- 
spersed nuclear elements (SINEs), which represent the majority of human 
genome repetitive content, exhibit a uniform global behaviour (Sup- 
plementary Table 3). Although intermediately methylated in sperm, 
their dynamics over preimplantation were otherwise similar to inter- 
genic sequences in general (Extended Data Fig. 9c). Alternatively, diverse 
LTR and LINE subfamilies are dynamic, providing several examples 
where DNA methylation appears to participate in specific regulatory 
transitions (Supplementary Tables 4 and 5). 

We found that the bimodality of LTR methylation is explained by 
species-specific endogenous retrovirus 1 (ERV1) family elements. Alter- 
natively, the ERVK family generally maintains high methylation levels, 
similar to observations in mouse and suggesting conserved, constitutive 
targeting* (Fig. 4a, Extended Data Fig. 9d-g). After fertilization, expres- 
sion sharply transitioned from a MalR-dominated early cleavage state 
resembling the oocyte to one composed of ERV1 and ERVK elements 
in the blastocyst and ES cells (Fig. 4b). Unexpectedly, some ERV1s seem 
to be induced later, including after global remethylation in ES cells, indi- 
cating a transition in the specific subfamilies that are expressed. Tran- 
scripts present early in preimplantation are generally from gametically 
hypomethylated ERV1 elements that are downregulated before de novo 
methylation (Fig. 4c, d). For these elements, methylation and expression 
are negatively correlated, indicating discriminatory targeting for even 
extremely related sequences (Extended Data Fig. 9h). In contrast, the 
LTR7 subfamily is hypermethylated in sperm, rapidly demethylated, 
and upregulated in the blastocyst and human ES cells (Fig. 4c, d). LTRs 
are rarely dynamic outside of this early versus late preimplantation axis: 
they are either already expressed in the oocyte and silenced later or are 
induced following demethylation and remain expressed in ES cells. Nota- 
bly, this latter dynamic includes a limited number of recently emergent, 
unrelated subfamilies’? *’. 

Compared to LTRs, LINEs maintain higher methylation levels and 
only the primate-specific LIPA phylogeny is dynamically expressed (Fig. 5a 
and Extended Data Fig. 9c). As the only actively transposing lineage in 
humans, L1PA subfamilies emerged as a linear phylogeny”. We found 
that the human-specific L1HS and its two closest ancestors, LIPA2 and 
L1PA3, are demethylated early, whereas older elements maintain higher 
embryonic methylation (Fig. 5b). Correspondingly, nearly all embryonic 
transcription could be attributed to these three youngest subfamilies 
(Fig. 5c and Extended Data Fig. 10a). Given the homology between sub- 
families, we searched for sequence composition changes that may explain 
the preimplantation-specific escape of younger elements. We aligned 
5' untranslated regions (UTRs) of full-length LIPA7 to LIHS and 
compared sequences that demarcated demethylated L1PA3-descended 
elements from constitutively targeted ancestors. The largest discrete 
difference corresponds to an ~ 130-bp deletion found within the L1PA3 
lineage itself that separates older elements from the L1HS progenitor 
L1PA3a” (Fig. 5d and Extended Data Fig. 10b, c). Intriguingly, the pres- 
ence or absence of this region isolates two disparately regulated subpo- 
pulations and marks the transition to embryonic expression (Fig. 5e, f 
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Figure 4 | LTR subfamily dynamics are divided into early and late 
preimplantation phases. a, Violin plots for the four LTR families present in 
human over early development and ES cell derivation. b, Pie charts of LTR 
family expression calculated as the number of fragments per million (FPM) that 
align to elements within the family. c, Mean methylation of notable ERV1, 
ERVK and MalR subfamilies. Three ERV1 subfamilies are included to represent 
discrete dynamics: shared gamete and early embryonic hypomethylation 
(LTR12c), constitutive methylation (HERV9-INT) and rapid demethylation 
(LTR7). The ERVK subfamily LTR5Hs is also demethylated. d, Expression 
dynamics for the same subfamilies in c. LTR12c is expressed early and 
downregulated in the blastocyst. Alternatively, LTR7 is expressed throughout, 
but upregulated in the blastocyst and maintained in ES cells, where it accounts 
for the majority of ERV1 transcripts. Like LTR7, LTR5Hs is only intermediately 
methylated during ES cell derivation and is embryonically induced. 
Alternatively, the ERV1 HERV9-INT remains repressed. MLT1H2 is the 
prevailing MalR transcribed in the oocyte and is lost after fertilization. 
Expression is the fragments per million that align to subfamily elements, 
divided by the kb annotated as the subfamily in the genome (FPKM). 


and Extended Data Fig. 10d). This adaptation may represent a specific 
moment in the evolutionary progression of the L1PAs when emerging 
elements evaded a seemingly sequence-directed, repressive mechanism. 
Whether older subfamilies retain expression or transposition potential 
without active silencing, or if this signature reflects a vestigial, host gen- 
ome adaptation that is no longer required, remains to be investigated. 

We present base pair resolution maps of DNA methylation as it is 
dynamically reconfigured during human early development. These data 
identify a set of transient, maternally contributed methylation at CGI 
promoters, the resolution of which suggest independent modes of acqui- 
sition: male germline specific protection against methylation or de novo 
targeting in the oocyte of otherwise canonically unmethylated CGIs™. 
Both are common and show poor conservation compared to classic ICRs, 
indicating that short-lived, parent-specific signatures are less evolutio- 
narily constrained than those persisting after implantation. We find 
that repetitive element regulation is notably diverse in human, more so 
than in mouse, with gametically hypomethylated LTR subfamilies pre- 
sent in the oocyte and early embryo and others sharply demethylated 
and induced embryonically. For LINEs, the stepwise phylogeny within 
the primate-specific L1PA lineage pinpoints a specific, adaptive trans- 
ition. As L1HS elements remain transpositionally active, including somat- 
ically in numerous cancers, the targeting of epigenetic silencing machinery 
during preimplantation may be relevant in identifying the root cause of 
their aberrant induction later**. Understanding the regulatory princi- 
ples inherent to the early embryo will improve continuing efforts to eval- 
uate complex traits with unclear modes of epigenetic inheritance. Future 
work to characterize the mechanisms that impose these diversely tar- 
geted embryonic methylation patterns will illuminate their contribution 
to normal human development and disease. 
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Figure 5 | Emergent L1PA subfamilies escape DNA methylation-associated 
repression during preimplantation. a, Pie charts of the LINE expression 
divided into the LIPA subfamily, other LINE1 and LINE2 subfamilies. Total 
expression is calculated as the number of fragments per million (FPM) that 
align to family elements. b, Mean methylation values for the most recent LIPA 
subfamilies. In cleavage embryos, L1HS through L1PA3 are demethylated and 
maintain these levels through the blastocyst. c, Expression dynamics for the 
same subfamilies in c over preimplantation and in ES cells. The three youngest 
L1PA subfamilies are induced by the 8 cell stage. Expression is the number 
of fragments per million that align to subfamily elements, divided the kb 
annotated as the subfamily in the genome (FPKM). The colours in c are the 
same as in b. d, Composite plot of cleavage stage methylation values across 
aligned 5' UTRs in L1PA subfamilies. The composite for L1PA3 is split by the 
presence (red) or absence (blue) of a ~130-bp sequence that distinguishes 
L1PA3b from L1PA3a and demarcates methylation values between older and 
newer subfamilies (highlighted in pink). Multiple sequence alignment for each 
subfamily to the assembled consensus is below each composite, with blue 
corresponding to conservation, black to divergence, and white to gaps or 
deletions. The x axis represents position along the LIHS 5’ UTR anda portion 
of ORF1. CpG Frequency describes per CpG conservation within single 
elements to the consensus. Two older sequences specific to LIPA7 are 
highlighted in grey. e, Boxplot of LIPA methylation in cleavage embryos, sorted 
by the presence of the ~130 bp sequence for all elements and L1PA3 
specifically. Preimplantation methylation is higher for elements that contain 
this insert. Bold line signifies the median, boxes and whiskers the 25th and 75th, 
and 2.5th and 97.5th percentiles. f, Expression composite of full-length insert 
deleted L1PA3a and insert containing L1PA3b subfamilies in oocyte, 8 cell 
and blastocyst stage embryos. Transcriptional induction is not apparent until 
after fertilization and is specific to LIPA3a. Read count is the read coverage 
normalized by total reads (Methods). 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

IRB approval. Harvard University institutional review board (IRB) and Embryonic 
Stem Cell Research Oversight (ESCRO) Committee approval was obtained for both 
the collection and experimental use of surplus embryos resulting from infertility 
treatment and donated for research. 

The Harvard University and Beth Israel Deaconess Medical Center institutional 

review boards both determined that collection, preparation, and experiments using 
discarded human gametes was not human subjects research and therefore did not 
require a full IRB review. 
Consent process. All embryos used in this study had previously been donated and 
stored at Harvard University. Couples donating surplus embryos for the purpose 
of research signed an extensive consent form at the time of their donation. These 
consent forms were approved by the Harvard University IRB. The authors did not 
have access to any identifying personal health information. 

Gametes were obtained from patients who signed a consent form authorizing 

the experimental use of discarded gametes. These consent forms were scanned and 
stored in the patient’s electronic medical record. The authors did not have access to 
any identifying personal health information. 
Human sperm collection and preparation. Semen samples were collected from 
five healthy patients between the ages of 30 and 34 undergoing an evaluation for 
infertility. Each male was a non-smoker with a body mass index <30 kgm *. Semen 
samples were collected by masturbation after 2 days of abstinence. A semen analysis 
was performed by an experienced andrologist confirming a normal sperm con- 
centration (>20 million ml '), normal motility (>50% motile), and normal mor- 
phology using the Kruger strict criteria (=4% normal forms). 

A PureCeption gradient solution (PureCeption 100% Isotonic Solution, Quinn’s 
Advantage Medium with HEPES, In- Vitro Fertilization Inc.) was prepared to purify 
each sperm sample and remove somatic cell contaminants. The gradient consisted 
of two layers of 1 ml of PureCeption: 90% and 47% in a 15-ml conical tube. 

Two millilitres of the semen sample was placed on top of the gradient. The gra- 
dient was centrifuged at 1100 r.p.m. for 20 min and the supernatant removed. One 
millilitre of sperm washing medium (Quinn’s Sperm Washing Medium, In- Vitro 
Fertilization Inc.) was used to re-suspend the pellet. The sample was then centrifuged 
at 750 r.p.m. for 10 min. The supernatant was removed. 0.1 ml of the remaining 
pellet of sperm was transferred to a 1.7 ml SafeSeal Microcentrifuge Tube (Sorenson 
Bioscience) and placed immediately at minus 80 °C. 

Human embryo thawing. Excess human embryos created via in vitro fertilization 
for the treatment of infertility were previously donated by patients undergoing assisted 
reproduction and stored in liquid nitrogen at minus 196 °C. 

Embryo culture dishes were set up using 60-mm culture dishes (BD Falcon) and 
eight 30-l drops of Global embryo culture media (LifeGlobal) plus 15% Plasmanate 
(Talecris) overlaid with 10 ml of oil (SAGE). Rinse dishes were also set up using 
2.5 ml of embryo culture media plus 15% Plasmanate. These dishes were equili- 
brated overnight at 37 °C, 5% CO. 

Both cleavage stage embryos and blastocysts were thawed using the Quinn’s Advan- 
tage Embryo Thaw Kit (SAGE). This kit contains three solutions: 0.5 M sucrose, 0.2 M 
sucrose and diluent. The straw or the vial containing the cryopreserved embryos 
was placed in a water bath at 30 °C for 2 min. The embryos were then expelled from 
the straw by removing the heat-sealed end or transferred from the vial using a Pasteur 
pipette to a clean tissue culture dish on a heated stage. The embryos were located 
and thawed according to the manufacturer’s instructions. 

Each embryo was rinsed and then placed in a single drop in the embryo culture 
dishes described above. Embryos were cultured in a humidified atmosphere at 
37 °C and 5% CO, in air. 

Embryo evaluation. Cleavage stage embryos were evaluated 2 to 4 h after the 
thawing process. Blastocysts were evaluated 18 to 24 h after the thawing process. 
Embryos were evaluated using a Nikon Eclipse 80i microscope and images of each 
embryo were obtained at 40X using the Hamilton Thorne Clinical Laser Software. 
These images were then independently evaluated by two senior embryologists. Sur- 
vival and quality were determined and an embryo was only included in this study if 
both embryologists agreed upon viability. 

Human embryo collection. Single viable embryos were briefly passed through 
several rounds of additional defined KSOM media (Millipore) under mineral oil 
before an Acidic Tyrode’s Solution (Sigma) wash to dissolve the zona pellucida, 
somatic cellular debris, and additional sperm. Single embryos were then rinsed in 
clean media drops before pooling of the embryos, assessment for the absence of 
contaminants and snap freezing in minimal volume. 

ICM and trophectoderm isolation of human and mouse embryos. All mouse 
experiments were conducted according to Harvard animal welfare guidelines and 
approved by the IACUC. Single human and mouse embryos exhibiting a clear ICM 
were isolated and the zona pellucida carefully removed to preserve the integrity of 
the blastocoel. They were then positioned using standard micromanipulation equip- 
ment (Narishige), oriented with a clear plane available to separate the ICM from the 


expanded trophectoderm and dissected using a Hamilton Thorne XYClone laser 
(Hamilton Thorne Biosciences) with 300 1s pulsing at 100% intensity. Short pulses 
progressed over the cleavage plane until the ICM and trophectoderm compart- 
ments were cleanly separated, at which point both pools were separated for col- 
lection in clean microdrops, serially washed and snap frozen in minimal volume. 
Mouse E.3.5 blastocysts were isolated from hormone-primed C57B16/J female mice 
4 to 6 weeks of age 3 days after mating with 129S1/SvIMJ males. 

Derivation of new human ES cell lines. Human embryo culture and human ES 
cell derivations were carried out as previously described”*. In brief, cleavage stage 
human embryos were thawed using Quinn’s Advantage Thaw Kit (SAGE) and cul- 
tured in Global medium (LifeGlobal) supplemented with 15% Plasmanate (Talecris) 
for 2 to 3 days until the blastocyst stage. For human ES cell derivation, the visible 
ICM was separated from the blastocyst by exposing the trophectoderm cells to 20- to 
30-cell lethal laser pulses from a Hamilton Thorne XY Clone laser (Hamilton Thorne 
Biosciences). The isolated ICM was then plated ona layer of gamma-irradiated mouse 
embryonic fibroblasts (MEFs) in derivation media consisting of KO-DMEM (Life 
Technologies), 15% KO-SR (Life Technologies), 2.5% Fetal Bovine Serum (FBS) 
(Hyclone), 2 mM Glutamax, 1% non-essential amino acids, 50 units per ml penicillin 
and 50 pig ml‘ streptomycin (Life Technologies), 0.055 mM B-mercaptoethanol 
(Life Technologies), 10 ng ml | bEGF (Millipore). Ten to twelve days after ICM 
plating, the embryonic stem cell outgrowth (passage 0) was mechanically dispersed 
with half of the outgrowth plated onto a new MEF feeder layer for human ES cell 
line establishment (passage 1), and half used for methylation analysis. The human 
ES cell colonies that resulted following plating were continuously mechanically 
passaged. The pluripotency of the lines was confirmed by staining for pluripotency 
markers and by in vitro differentiation into the 3 germ layers. The lines were regis- 
tered with the Harvard University ES;CRO Committee as HUES 71, HUES 72 and 
HUES 73. 

Isolation of mouse E6.5 epiblast and extraembryonic ectoderm. Isolation of 
E6.5 epiblast and extraembryonic ectoderm was adapted from ref. 27. Hormone- 
primed C57B16/J female mice 4 to 6 weeks of age were euthanized 6 days after mating 
with 129S1/SvIMJ males. Deciduae were removed from the uterine horn and the 
full embryo extruded and placed under mineral oil in KSOM media using a pulled 
glass capillary. Residual maternal contaminants were removed by continuous mouth 
pipetting, after which the epiblast and extraembryonic ectoderm were bisected using 
an obliquely cut flame drawn glass capillary and the respective tissues segregated in 
separate KSOM drops. Visceral endoderm was removed from either epiblast or 
extraembryonic ectoderm by incubation in 0.5% trypsin, 2.5% pancreatin (Sigma) 
dissolved in PBS for 20 min at 4 °C, after which they were returned to KSOM medium 
drops and incubated for an additional 5 min at room temperature. Using a glass 
capillary pulled to a diameter slightly less than that of the embryo, visceral endo- 
derm was removed by rapid aspiration and expulsion. Cleaned epiblast or extra- 
embryonic ectoderm tissue were then serially washed through several additional 
drops of KSOM before pooling and snap freezing at minimal volume. 

Library preparation and sequencing. RRBS libraries were generated as described 
and sequenced on an Illumina Genome Analyzer II before alignment and ana- 
lysis’. The sequencing reads were aligned to the Human Genome Build 19 (hg19) 
for human samples and Mouse Genome Build 37 (mm9) for mouse samples using 
a custom computational pipeline taking into account the strain background for mouse 
samples. The data set was supplemented with mouse early development methylation 
profiles from ref. 7 and human fetal somatic methylation profiles from the NIH 
Epigenomics Roadmap Project. Human ES cell lines H1, H9, HUES64, and HUES6 
were used for comparison with our newly derived human ES cell lines. Sample quality 
was assessed by looking at coverage numbers (that is, number of loci present and 
coverage of loci) and similarity between biological replicates using Pearson cor- 
relation, Euclidean distance, and visual inspection of methylation histograms (see 
Extended Data Figs 2 and 7). 

Estimating methylation levels. The methylation level of each sampled cytosine 
was estimated as the number of reads reporting a C, divided by the total number of 
reads reporting a C or T. Single CpG methylation levels were limited to those CpGs 
that had at least tenfold coverage. For 100-bp tiles, reads for all the CpGs that were 
covered more than fivefold within the tile were pooled and used to estimate the 
methylation level as described for single CpGs. The CpG density for a given single 
CpG is the number of CpGs 50 bp up- and downstream of that CpG. The CpG 
density for a 100-bp tile is the average of the CpG density for all single CpGs used 
to estimate methylation level in the tile. 

The methylation level reported for a sample is the average methylation level 
across replicates. A replicate will contribute to the average only if it meets the cov- 
erage criteria within the replicate. Technical replicates were averaged before con- 
tributing to the sample average. 

Genomic features. For mouse, high density CpG promoters (HCPs), intermediate 
density CpG promoters (ICP), low density CpG promoters (LCPs), CpG island, 
and DMR annotations were taken from ref. 28. LINE, LTR and SINE annotations 
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were downloaded from the UCSC (University of California, Santa Cruz) browser 
(mm9) RepeatMasker tracks. Gene annotations were downloaded from the UCSC 
browser (mm9) RefSeq track. Promoters (TSSs) are defined as 1 kb up- and down- 
stream of the TSS and are parsed from RefSeq annotation. Promoters for all iso- 
forms are included. Enhancer annotations were taken from Supplementary Table 1 
of ref. 29. Corresponding human annotations were downloaded from the UCSC 
browser for hg19. Human imprinting control regions were taken from ref. 30. 

In each case, the methylation level of an individual feature is estimated by 
pooling read counts for all CpGs within the feature that are covered greater than 
fivefold, and levels are only reported if a feature contains at least 5 CpGs with such 
coverage (in contrast to 100-bp tiles where no minimum number of CpGs is required). 
A tile is annotated as a genomic feature if any portion of the tile overlaps with the 
feature and thus, may be annotated by more than one feature (for example, the same 
can be annotated as both a promoter and a gene). 

Gene expression analysis. Raw RNA-seq data were downloaded from the gene 
expression omnibus from accession GSE36552 for human oocytes, zygotes, 2 cell, 
8 cell, morulae, late blastocyst, human ES cell passage 0 and human ES cell passage 
10, taken from ref. 13. For all samples but morulae and late blastocyst, data for all 
single cells were pooled before alignment. Morulae and late blastocyst samples 
were pooled according to their respective embryos. Duplicate reads were removed 
before alignment. Alignment was performed using TOPHAT against human gen- 
ome assembly 19 with default settings. Cufflinks was used for quantification and 
statistical tests of significant change using default settings. 

Retrotransposon expression analysis. Alignment was performed using BWA against 
human genome assembly 19 with default settings. Repeat subfamily FPKM is the 
sum of the number of reads that align to each repeat element for the subfamily divided 
by the genome coverage of the subfamily in kilobases and normalized by the total 
number of reads in the sample. Repeat subfamily FPM is the same as FPKM without 
normalizing to the subfamily’s genome coverage. Samtools was used to find can- 
didate repeat alignments and the CIGAR string was parsed to determine whether 
the read overlapped with the repeat element. One L1PA6 element (chr 2 : 49454725- 
49460932) contained an unusual tall, short peak of read density and was excluded 
from analysis. 

The L1PA consensus plot for expression was produced by using the consensus 
coordinates for the element from the UCSC genome browser to align the reads 
from the element’s alignment to the consensus sequence. Read count is the read cov- 
erage at each position in the consensus sequence divided by the total number of 
reads in the sample. Note that we did not use a multiple sequence alignment of the 
repetitive elements so this is not a fully accurate representation of expression over 
the consensus. For example, if the element has an insertion compared to the con- 
sensus sequence, then reads that overlap the insertion will contribute to read density on 
the consensus. The genomic sequence density plot was generated in a similar manner. 
Orthology between human and mouse. The 46 mammals multiple sequence align- 
ment downloaded from the UCSC browser was used to find orthologous regions 
from hg19 to mm9. For 100-bp tiles, the methylation for the corresponding region 
in mouse was used for comparison regardless of the length of the corresponding 
region. For genomic features, the methylation level of the corresponding region in 
mouse was used unless the mouse coordinates overlapped a mouse annotation of 
the same type. In the latter case, the methylation level of the corresponding feature 
was used instead. The 60 mammals multiple sequence alignment downloaded from 
the UCSC browser for mm10 was used to find orthologous regions from mouse to 
human by first translating the mm9 genomic feature coordinates to mm10, and then 
following the same procedure above. 

RRBS selectively enriches for a consistent fraction of CpG dense genomic frag- 
ments within a given species, and as such provides genome-scale information versus 
genome-wide. In human and mouse, the coverage for CGIs is 87.4% and 89.6%, for 
exons, 11.3% and 8.8%, and for introns, 26.4% and 13.4%, respectively. In general, 
for both species, most features are captured at similar frequencies, but far more 
SINEs are captured in human than in mouse. Of the CGIs that are captured by 
RRBS in human, 82.7% align to the mouse genome, and 53.9% align and share CGI 
status in mouse. Out of the loci that aligned to the mouse genome, mouse RRBS 
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captured 93.6% of shared CGIs and 72.9% of aligned regions not annotated as CGIs 
in mouse. 85.2% of RRBS-captured human exons align to mouse exons and, of these, 
52.7% are captured by mouse RRBS. For introns, 86.1% of human introns align to 
mouse introns and, of these, 34.2% are captured in mouse. 

Derivation sliders used to compare human ES cells to mouse pre and post 
implantation pluripotent tissues. Regions with a methylation difference >0.1 
between mouse ICM and mouse epiblast were used to assess the similarity of a 
sample to either of these tissues. For each sample, a region scores as mouse ICM if 
its methylation level is more similar to mouse ICM than mouse epiblast and vice 
versa. Regions with methylation values that are equidistant from mouse ICM and 
mouse epiblast are excluded. If the slider is viewed as going from 0 (mouse ICM) 
to 1 (mouse epiblast), then the position of a sample is simply the proportion of 
regions that scored mouse ICM over the total number of regions that contribute a 
score. 

Clustering and feature enrichment. One hundred base pair tiles were clustered 
using k-means clustering. Clusters were designated as hypermethylated (=0.5) or 
hypomethylated (<0.5) in sperm according to the cluster centre. A tile and a feature 
were designated as overlapping if there was an overlap of 1 bp or more between 
them. Feature enrichment scores are the negative log of the P value calculated using 
the hypergeometric distribution. 

Identification of gametic differentially methylated regions in human. Regions 
that have low methylation in sperm and some methylation in the early embryo 
likely represent instances of maternal methylation if the assumption that there is 
little to no de novo methylation over the cleavage divisions is true. A region is con- 
sidered a maternally methylated differentially methylated region if: first, it is signi- 
ficant after a two sample t-test between sperm and blastocyst with equal variance 
after correction for multiple hypothesis testing (q value <0.05 using the Storey 
method, ref. 31); second, it has a methylation difference =0.2 higher in blastocyst 
than in sperm using sample means; and third, it has a mean methylation level =0.2 
for 8 cell. These criteria were applied to both 100-bp tiles and to CGIs. 
Identification of human SNPs. SNPs in human were downloaded from the 1000 
Genomes Project (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/). 
SNPs that are not trackable by RRBS (C/T or A/G) and positions that are not 
covered in an in silico digest of hg19, that is, covered by RRBS, were removed. The 
ratio (reference count/(reference count + alternative count)) was calculated for 
each SNP in the single human blastocyst samples and SNPs with ratios <0.2 or ratios 
>0.8 were removed since they likely represent homozygous alleles in the sample. We 
used the resulting genotypes to facilitate parent-of-origin methylation tracking. 
Parent-of-origin methylation tracking. Reads were segregated into either the ref- 
erence or alternative allele, and CpG methylation levels were called in the same man- 
ner described above. SNP normalized methylation values (Extended Data Fig. 7) 
are the average of the methylation values derived from each haplotype. 

L1PA Sequence Alignment. All LINE L1HS and L1PA2-7 elements =5,500 bp 
were taken from the human genome and aligned using Muscle with the following 
parameters: Muscle - Maxiters 2 - Diags. Methylation levels are from full-length 
elements that are captured by RRBS. The ~130 bp insert sequences common to 
elements L1PA3b and older were similarly aligned for Extended Data Fig. 10d. 
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Extended Data Figure 1 | Isolation of human preimplantation embryos purified from the zona pellucida before RRBS profiling. Embryos are displayed 
for DNA methylation mapping. a, Three replicates of D6 embryos, ranging _ before purification. c, Cell numbers from thawed cleavage stage embryos 

in inputs from three to five embryos, were thawed, screened for proper ranged from 4 to 11 cells per embryo with a median of 8 (+1.6 standard 
morphology, independently isolated from the zona pellucida and pooled before _ deviation) cells. Within each replicate, only three embryos demonstrated onset 
RRBS profiling. Embryos are displayed before purification. b, Two replicates of compaction at the time of collection. Red line signifies the median, boxes and 
of 18 and 19 human D3 cleavage stage embryos were thawed, screened for whiskers the 25th and 75th, and 2.5th and 97.5th percentiles, respectively. 
proper morphology, assessed for embryonic stage and cell number, and 
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Extended Data Figure 2 | Assembly of a genome-scale DNA methylation 
time series through human early development and over ES cell derivation. 
a, Summary of RRBS libraries generated, with number of biological replicates 
(n), number CpGs captured at 1X, 5 and 10X, mean and median methylation 
values for 100-bp tiles estimated from CpGs covered at =5X, and mean 
Euclidean distance and Pearson correlation across biological replicates for these 
tiles. b, Pearson correlation matrix for sperm, early embryonic and fetal tissue 
samples. c, Clustering of gametic, somatic and preimplantation methylation 
profiles segregate according to their global DNA methylation landscape, with 
sperm and fetal tissue forming a somatic methylation cluster that contrasts the 
unique epigenetic landscape present in preimplantation embryos. d, Summary 
of RRBS libraries generated for ES cell derivation, with number of biological 
replicates (n), number CpGs captured at 1X, 5X and 10X, mean and median 
methylation values for 100-bp tiles estimated from CpGs covered at =5X, and 
mean Euclidean distance and Pearson correlation across biological replicates 


for these tiles. “ES cell ref refers to a reference collection of previously assayed 
ES cell lines as part of the NIH Roadmap Epigenomics Project (Methods). 
Human ICM and trophectoderm (TE) were isolated through laser-assisted 
microdissection. e, Pearson correlation matrix for human samples used to 
model ES cell derivation. A consistent signature is rapidly acquired by the 
outgrowth stage (p0) and stably maintained over additional passages. 

f, Methylation histograms for 100-bp tiles for human blastocysts and dissected 
ICM and TE tissue show minimal global difference, which is also observed 
when comparing previously assayed, immunosurgically purified mouse ICM to 
mechanically dissected ICM and TE. g, Boxplots of the change in methylation 
(A methylation) for 100-bp tiles from cleavage to the blastocyst stage show 
passive demethylation of DNA methylation, particularly for regions that 
exhibit the highest methylation levels at this stage. The red line signifies the 
median, boxes and whiskers the 25th and 75th, and 2.5th and 97.5th percentiles, 
respectively. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


a b 
“oh ; 
ICM E6.5 Epi E6.5 ExE ICM _.. Epi 
= Featuri 
a n=992,652 eatures * ICM 
CGI * e n=1,220 
S ¢ Outgrowth 
© 10 Promoters 
s| 1 
ro) TSS * e n=1,142 p 
& 0 @ps 
i= 1=714,392 Ca a dae - ES cell ref 
2.0 
8| 5 ICP -® @ 1-76 
rs) 
i i LCP -# =e@ n=12 
‘: Intragenic 
Genes * » n=13,275 
Methylation 
Exons * *@ n=3,275 
Introns .# e n=13,080 
Enhancers 
Pluripotent .# e n=551 
Shared .# e n=903 
Somatic # @  n=466 


Extended Data Figure 3 | Comparison of human ES cell derivation to in vivo _ orthologous features. The position along the axis from preimplantation (ICM) 
mouse pluripotent tissues. a, Global methylation histograms of 100-bp to postimplantation (Epi) pluripotency represents the proportion of regions 
tiles for human ICM and pS ES cells (rows) compared against mouse in a set that resemble one state versus the other. For all feature sets, human 
preimplantation and postimplantation embryos (ICM, E6.5 Epi), as wellas with — ES cells rapidly establish an epiblast identity, maintaining this signature from 
extraembryonic ectoderm (ExE) (columns), demonstrate the rapid acquisition the outgrowth stage over ensuing passages. ‘ES cell ref refers to a reference 
of an epiblast-like, somatic methylation pattern upon ES cell derivation in collection of previously assayed ES cell lines as part of the NIH Roadmap 
human. b, Regions that discriminate mouse ICM from E6.5 epiblast were used — Epigenomics Project (Methods). 

to assign human ICM and ES cells to an equivalent in vivo pluripotent state for 
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Extended Data Figure 4 | Inverse correlation between expression and 
promoter methylation is retained during human preimplantation. 

a, Scatterplots of oocyte, preimplantation embryo and ES cell derivation gene 
expression compared to promoter methylation display a canonical negative 
correlation, even during preimplantation where the range of promoter 
methylation values is contracted by global hypomethylation. b, Box plots of 
gene expression values for genes significantly upregulated by =twofold from 
oocyte to 8 cell compared to non-dynamic genes and categorized by promoter 
methylation dynamics. Genes that are both demethylated and upregulated 
are associated with induction from a silenced state, while those that are 
demethylated but not upregulated display only basal level transcription that is 
significantly lower than observed in promoters that are not demethylated. Bold 


POUSF1 
a 


HH 


line signifies the median, boxes and whiskers the 25th and 75th, and 2.5th and 
97.5th percentiles, respectively. c, Gene expression dynamics following 
fertilization for hypermethylated sperm promoters demethylated =0.5 by the 
cleavage stage compared to the rest of promoters (Other). 123 of 541 (22%) 
demethylated promoters demonstrate significant upregulation (=twofold) 
compared to only 13.6% of other promoters. Moreover, the ratio of upregulated 
to downregulated genes in the demethylated set substantially favours zygotic 
activation, while other promoters include a higher proportion of 
downregulated maternal transcripts (odds ratio = 1.877, P = 1.344 X 10°, 
hypergeometric test). d, RNA-seq track of the pluripotency promoting, 
zygotically induced gene POU5F1, whose promoter is demethylated from 0.59 
in sperm to 0.02 in cleavage, concurrent with its transcriptional induction. 
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Extended Data Figure 5 | Local retention of DNA methylation is similar for 
introns over human and mouse preimplantation. a, Introns are clustered 
according to their dynamics in human and the methylation of orthologous 
regions are tracked in mouse. Divergence is predominantly restricted to 
intermediately methylated features in human sperm that are generally 
hypermethylated in mouse. The A methylation heatmap displays the difference 
in methylation values between equivalent preimplantation time points, with ES 
cells in human serving as a proxy for comparison to the E6.5 epiblast in mouse. 
Deviation is most apparent for intermediately methylated human sperm 
introns, where they are less methylated than in mouse sperm. RMSK included, 
repeat masker annotated regions included. n shows the number of introns 
available for cross species comparison. b, When repetitive elements are 


removed from the calculation of intron methylation, the apparent divergence 
between mouse and human values in sperm is diminished. Methylation and 
A methylation heatmaps are as in a. Gray denotes missing values (m.v.) where 
estimates for intronic methylation were exclusively derived from repetitive 
elements. RMSK-free: repeat masker annotated regions excluded. c, Violin 
plots of the two main dynamics (maintained versus demethylated, top two 
clusters in b) for sperm hypermethylated introns over human and mouse 
preimplantation after repetitive elements are removed. As observed for 
orthologous exons, regions that retain high methylation throughout human 
preimplantation are conserved, hypermethylated in both mouse gametes, 
and display maintained regulation as early as the zygote stage. 
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Extended Data Figure 6 | Genomic characterization of transient maternally 
contributed imprint-like regions. a, Heatmap of 100-bp tiles in mouse 
preimplantation identified using the same criteria as applied to human 
(Methods). This criteria, which assumes limited de novo methylation, identifies 
2,044 tiles in mouse where methylation is =0.2 in both 8 cell and the ICM, 
there is =0.2 methylation difference between the ICM and sperm, and this 
difference is significant via t-test, (q value <0.05). 89% of those tiles that are 
captured in the mouse oocyte are monoallelically inherited and show 
significant differences between the gametes by t-test, providing an empirical 
upper bound on the false discovery rate for this strategy when applied to human 
of =0.11, assuming the underlying principles of imprint regulation are the 
same as in mouse during this developmental stage. b, The proportion of 100-bp 
tiles, classified according to their resolution in ES cells, for each genomic 
feature presented in Fig. 3a. c, Cumulative density function (CDF) plot of 
the distance to the nearest annotated TSS for CGI DMRs that resolve to 
hypomethylation, intermediate or variable methylation, or hypermethylation. 
There is a discrepancy in genomic location between those that resolve to 
hypomethylation, of which a sizable fraction are in the TSS, and those that do 


n=368 


n=166 n=260 


not, which are generally enriched further downstream. d, Boxplots of 

CpG density for CGI DMRs that resolve to hypomethylation, intermediate 
or variable methylation, or hypermethylation paired with comparable 
non-DMR CGls (Somatic). Those resolving to hypomethylation have higher 
CpG densities than those that resolve to intermediate or variable, or 
hypermethylation, but have slightly lower CpG density than non-DMR, 
constitutively hypomethylated CGIs. Alternatively, while CGIs that resolve to 
hypermethylation show a lower CpG density than other DMRs, they show 
higher density than non-DMR hypermethylated islands, suggesting some 
level of protection against deamination as an attribute of their uniquely 
hypomethylated status in the male germline. e, Pie charts of cross species 
alignment and CGI status of human CGI DMRs into mouse. Those that resolve 
to hypomethylation are more often conserved in mouse and more frequently 
retain their CGI status, whereas those resolving to hypermethylation are less 
conserved. Moreover, intermediate/variable and hypermethylation-resolving 
regions that do align are less frequently retained as CGIs. 368, 166 and 260 CGIs 
comprise the hypo, intermediate or variable, and hyper methylation sets, 
respectively. 
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Extended Data Figure 7 | Generation of single blastocyst libraries confirm 
the monoallelic behaviour of putative maternal DMRs. a, Summary of two 
single blastocyst RRBS libraries. Number of CpGs captured at 1X, 5 and 10x, 
mean and median methylation values for 100-bp tiles estimated from CpGs 

covered at =5X, and mean Euclidean distance and Pearson correlation when 
single blastocyst replicates are compared to the pooled blastocyst time point. 

b, Histograms of DNA methylation for 100-bp regions captured for each single 


blastocyst replicate. c, The ratio of reference allele to alternative allele for single 
nucleotide polymorphisms (SNPs) called as heterozygous in each blastocyst 
replicate. d, For the 4,492 and 5,118 SNPs that were considered as heterozygous 
within each single blastocyst, 10,068 and 11,415 single CpGs could be assigned 
to alleles. Scatterplots depict untracked methylation values for these CpGs 
against their normalized methylation values, which are the average of their 
monoallelic methylation states. 
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Extended Data Figure 8 | The somatic promoter of DNMT1 is maternally 
methylated in human and mouse. a, Plots of single CpG methylation for 
DNMT1, including a CGI over the somatic promoter that behaves as a 
transient, preimplantation-specific DMR in both human and mouse. In mouse, 
hypermethylation of this island corresponds to its transcriptional readthrough 
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and exclusion as part of an oocyte-specific isoform (Dnmt1-o) that is not 
annotated in human. Annotated CGIs and species conservation tracks are 
included for reference below. b, Heatmap of orthologous ICR dynamics over 
human and mouse preimplantation. Of those that map between species and are 
captured by RRBS, all but one (PEG10) behave identically. 
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Extended Data Figure 9 | Repetitive element regulation during human and 
mouse preimplantation. a, Violin plots for LTRs over human and mouse 
development. In human, LTRs demonstrate a bimodal distribution in sperm. 
Hypermethylated LTRs display a range of demethylation in the early embryo 
that reflects the dynamics of subfamilies. Upon ES cell derivation, and within 
fetal tissues, LTRs become stably hypermethylated. Alternatively, during mouse 
preimplantation, LTRs are consistently hypermethylated in sperm and 
generally retain methylation over preimplantation. E6.5 Epi and E6.5 ExE refer 
to dissected epiblast and extraembryonic tissue from E6.5 embryos. b, Violin 
plots for LINEs over human and mouse development. In human sperm, LINEs 
are unstably hypermethylated, with discrete populations methylated with a 
mean of ~0.75, 0.9, and a small subpopulation showing gametic escape from 
high methylation. Alternatively, LINEs are indiscriminately hypermethylated 
in mouse sperm. In both species, several populations of elements demonstrate 
different extents of demethylation during preimplantation, including many 
that retain higher levels in cleavage and only minor, passive depletion into 
blastocyst. Upon human ES cell derivation or during mouse implantation, 
elements are generally remethylated, though only partially for those elements 
that are demethylated after fertilization. Hypermethylation is complete in fetal 
tissue. In human, these discrete dynamics can be attributed to the unstable 
methylation for L1HS-L1PA3a subfamilies while, in mouse, subsets of 
L1Md_Tf and L1Md_Gf subfamilies are similarly demethylated and elements 
of the independently emerging L1Md_A lineage remain largely methylated. 
c, Violin plots for SINEs highlight intermediate methylation in sperm in both 
species, though more so for humans. After fertilization, SINE methylation 
rapidly diminishes to near complete hypomethylation over preimplantation, 
similar to what is observed for intergenic sequence, before complete 
hypermethylation during ES cell derivation in human or in postimplantation 
mouse E6.5 embryos. Taken globally, SINEs appear to be uniformly regulated 
regardless of subfamily, though differences in regulatory status for specific 
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SINE elements may be reflected by their surrounding genomic context. 
Unfortunately, such inferences require higher genomic resolution than is 
currently available to distinguish the dynamics of specific integrations. 

d-g, Violin plots of the four major LTR families present in mouse over the 
complete preimplantation timeline. ERV1 elements (d) are hypermethylated in 
sperm and display a range of demethylation following fertilization and prompt 
remethylation upon implantation. In mouse, ERVK elements (e) are emergent 
and largely consist of the dominating, constitutively hypermethylated IAP 
subfamilies. ERVL and MalR (ERVL-MalR) elements (f and g), the 
evolutionarily oldest mammalian LTRs, are hypermethylated in sperm and 
rapidly demethylated after fertilization, frequently in association with their 
rapid zygotic induction. h, Distribution (as boxplots) of per element expression 
and CpG density at different methylation levels for LTR12c demonstrates 
negative correlation between methylation and expression. On average, LTR12C 
is hypomethylated in sperm and the early embryo, but demonstrates a 
consistent range of values at the level of single elements, with least methylated 
elements contributing the most to LTR12c expression. The CpG density of 
these elements corresponds to their degree of hypomethylation, suggesting that 
escape from de novo methylation during spermatogenesis and preimplantation 
is maintained for specific elements over generations. Once targeted, element 
expression is apparently restricted and its CpG density decays correspondingly. 
During ES cell derivation, the kinetics of LTR12c methylation is more rapid for 
those of lower CpG density, as evident from p0 to p5 in the ES cell lines. DNA 
methylation in the early embryo is therefore not exclusive to the regulation 
of different ERV1 subfamilies, but also affects the contribution of single 
elements to the broader transcriptional pattern. Bold line signifies the median, 
boxes and whiskers the 25th and 75th, and 2.5th and 97.5th percentiles, 
respectively. Expression is calculated as the number of fragments per million 
that align to a given element divided by its length in kb (FPKM). 
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Extended Data Figure 10 | L1PA subfamily dynamics during human early 
development. a, Expression composite averaged by genomic representation 
for L1HS through L1PA7 from oocyte through preimplantation and ES cell 
derivation. Dynamic expression within the L1PA phylogeny is restricted to the 
same subfamilies that are demethylated by cleavage. The position of each 
respective 5’ UTR, the functional promoter for LINEs, is highlighted in the 
legend. Beneath these composites is the genomic representation to the 
full-length consensus for each annotated LIPA subfamily, which demonstrates 
relative equivalence of 5’ UTR representation across different subfamilies, 
but an increasing proportion of truncated 3’ fragments with subfamily age 
(Methods). b, The frequency of CpGs within aligned L1PA subfamilies, 
including 5’ UTR, ORF1 and ORF2, and 3’ UTR. CpGs are primarily enriched 
within the 5’ UTR promoter and become progressively CpG depleted with 


element age. c, Complete composite plot of cleavage stage methylation values 
across aligned 5’ UTRs from L1HS through L1PA7 as in Fig. 5d. The multiple 
sequence alignment for each subfamily to the assembled consensus is visualized 
below each composite, with blue corresponding to conservation, black to 
divergence, and white to gaps or deletions. The x axis represents position along 
the 5’ UTR and a portion of ORF1 for the L1HS consensus. CpG Frequency 
describes the level of conservation for individual CpGs found within single 
elements to the consensus. The ~130-bp sequence present from L1PA7 to 
L1PA3b and absent from L1PA3a to L1HS is highlighted in pink, while two 
older sequences specific to L1PA7 are highlighted in grey. d, Percent identity to 
the consensus for the extracted ~130-bp insert sequence in elements from 
L1PA7 through L1PA3b. Mean nucleotide identity to the consensus is 85%, 
with a median of 89%. 


©2014 Macmillan Publishers Limited. All rights reserved 


1 sid al Be 


doi:10.1038/nature13393 


Targeting transcription regulation in cancer with a 


covalent CDK7 inhibitor 


Nicholas Kwiatkowski'?**, Tinghu Zhang"**, Peter B. Rahl°, Brian J. Abraham’, Jessica Reddy*", Scott B. Ficarro!», 
Anahita Dastur®, Arnaud Amzallag®’, Sridhar Ramaswamy®”, Bethany Tesar®’, Catherine E. Jenkins!°, Nancy M. Hannett?, 
Douglas McMillin®’, Takaomi Sanda'””, Taebo Sim’, Nam Doo Kim", Thomas Look", Constantine S. Mitsiades®”, 
Andrew P. Weng"°, Jennifer R. Brown®”, Cyril H. Benes®, Jarrod A. Marto’*°, Richard A. Young** & Nathanael S. Gray"? 


Tumour oncogenes include transcription factors that co-opt the general 
transcriptional machinery to sustain the oncogenic state’, but direct 
pharmacological inhibition of transcription factors has so far proven 
difficult”. However, the transcriptional machinery contains various 
enzymatic cofactors that can be targeted for the development of new 
therapeutic candidates’, including cyclin-dependent kinases (CDKs)*. 
Here we present the discovery and characterization of a covalent CDK7 
inhibitor, THZ1, which has the unprecedented ability to target a remote 
cysteine residue located outside of the canonical kinase domain, pro- 
viding an unanticipated means of achieving selectivity for CDK7. 
Cancer cell-line profiling indicates that a subset of cancer cell lines, 
including human T-cell acute lymphoblastic leukaemia (T-ALL), have 
exceptional sensitivity to THZ1. Genome-wide analysis in Jurkat T-ALL 
cells shows that THZ1 disproportionally affects transcription of RUNX1 
and suggests that sensitivity to THZ1 may be due to vulnerability con- 
ferred by the RUNX1 super-enhancer and the key role of RUNX1 in 
the core transcriptional regulatory circuitry of these tumour cells. 
Pharmacological modulation of CDK7 kinase activity may thus pro- 
vide an approach to identify and treat tumour types that are depend- 
ent on transcription for maintenance of the oncogenic state. 

In an effort to discover new inhibitors of kinases that regulate gene 
transcription, we performed cell-based screening and kinase selectivity 
profiling of a library of known and novel ATP-site-directed kinase inhi- 
bitors (see Supplementary Table 1 for known CDK7 inhibitors). We iden- 
tified THZ1 (Fig. 1a), a phenylaminopyrimidine bearing a potentially 
cysteine-reactive acrylamide moiety, as a low nanomolar inhibitor of cell 
proliferation and biochemical CDK7 activity (Fig. 1b, c). To investigate 
the functional relevance of the acrylamide moiety, we prepared a non- 
cysteine reactive analogue, THZ1-R, which showed comparatively less activ- 
ity on CDK7 and reduced antiproliferative potency (Fig. 1b, c). KiNativ 
profiling’, which measures the ability of acompound to block nucleotide- 
dependent enzymes from biotinylation with a reactive desthiobiotin- 
ATP probe, established CDK7 as the primary intracellular target of THZ1, 
but not of THZ1-R (Supplementary Table 2). Kinome-wide profiling 
identified additional kinase targets of THZ1; however, we confirmed CDK7 
as the only target showing time-dependent inhibition, which is suggestive 
of covalent binding (Extended Data Fig. lac and Supplementary Table 3). 

As no covalent inhibitors of CDKs have been reported, we next focused 
our studies on the mechanism by which THZ1 could achieve covalent 
inhibition of CDK7. We first incubated a recombinant CDK7-cyclin H- 
MATI1 (MAT1 also known as MNAT1) trimeric complex with a biotinylated 
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Figure 1 | Cell-based screening and kinome profiling identifies 
phenylamino-pyrimidines as a potential CDK7 scaffold. a, Compound 
structures of THZ1, THZ1-R and bio-THZ1. b, THZ1 potently inhibits 
proliferation of Jurkat and Loucy T-ALL cell lines. Cell lines were treated with 
THZ1 or THZ1-R for 72 h. Experiments were performed in biological triplicates. 
Error bars show + standard deviation (s.d.). c, THZ1 and THZ1-R have different 
binding affinities for CDK7. LanthaScreen Eu Kinase Binding Assay was 
conducted at Life Technologies in a time-dependent manner. Dissociation 
constant (Kg) values are shown after 180 min incubation with compounds. 
Experiments were performed in biological triplicates. Error bars show + s.d. 
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version of THZ1 (bio-THZ1; Fig. 1a) and demonstrated that it indeed 
covalently modifies CDK7 (Fig. 2a and Extended Data Fig. 1d—g). Mass 
spectrometry identified the site of covalent modification as C312, a residue 
located outside the kinase domain (Extended Data Fig. 2a—d). Inspection 
of the crystal structure revealed that a carboxy-terminal extension of 
CDK7 bearing C312 traverses the ATP cleft in the kinase domain and would 
be predicted to position C312 directly adjacent to the reactive acrylamide 
moiety of THZ1 (Fig. 2b). Mutation to serine (C3128), a less nucleophilic 
amino acid, prevented THZ1 from covalently binding to CDK7 and from 
inhibiting CDK7 activity in an irreversible fashion (Fig. 2c and Extended 
Data Fig. 2e). Sequence alignment of the 20-member CDK family suggests 
that C312 is unique to CDK7; however CDK12 and CDK13 also possess 
accessible cysteines within four amino acids of C312 (Extended Data Fig, 3a). 
Indeed, we found that THZ1 can inhibit CDK12 kinase activity at slightly 
higher concentrations (Extended Data Fig. 3b-f). To our knowledge, THZ1 
is the first inhibitor that has been demonstrated to target a cysteine located 
outside of the kinase domain, which provides an unanticipated means 
of achieving covalent selectivity. 

CDK7 kinase activity has been implicated in the regulation of both 
transcription, where it phosphorylates the C-terminal domain (CTD) of 
RNAP polymerase II (RNAPII)°* and CDK9 (ref. 9), and the cell cycle, 
where it functions as the CDK-activating kinase (CAK) for CDK1, 2, 4 and 
6 (refs 10-14). THZ1, but not THZ1-R, completely inhibits the phosphor- 
ylation of the established intracellular CDK7 substrate RNAPII CTD at 
Ser 5 and Ser 7 (refs 6, 8), with concurrent loss of Ser 2 phosphorylation at 
250 nM in Jurkat cells (Fig. 2d). Cellular washout experiments demonstrate 
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Figure 2 | THZ1 irreversibly inhibits RNAPII 
CTD phosphorylation by covalently targeting a 
unique cysteine located outside the kinase 
domain of CDK7. a, Bio-THZ1 binds irreversibly 
to CDK7. Recombinant CAK complex was 
incubated with bio-THZ1 with or without THZ1 
at 37 °C for 4h and biotinylated proteins were 
resolved by SDS-polyacrylamide gel 
electrophoresis (SDS-PAGE). IB, immunoblot. 

b, Docking model of THZ1 in the ATP-binding 
pocket of CDK7 (Protein Data Bank accession 
1UA2). CDK7 is depicted with grey ribbons and 
THZ1 in turquoise. Key residues are indicated. 
C312 has been modelled into the crystal structure. 


c, Mutation of C312 to serine (C312S) rescues wild- 
type kinase activity in the presence of THZ1. 
HCT116 cells stably expressing Flag-tagged CDK7 
proteins were treated with THZ1 or THZ1-R 

for 4h. Exogenous CDK7 proteins were 
immunoprecipitated with Flag antibody and 
subjected to in vitro kinase assays. CS, Coomassie 
stain; KD, kinase-dead; WT, wild type. d, THZ1 
inhibits RNAPII CTD phosphorylation. Jurkat 
cells were treated with THZ1 or THZ1-R for 4h 
and proteins of interest were resolved by SDS- 
PAGE. e, THZ1, but not THZ1-R, shows 
irreversible inactivation of CDK7. Jurkat cells were 
treated with THZ1 or THZ1-R for 4h followed by 
washout of inhibitor-containing medium. Cells 
were then allowed to grow in medium without 
inhibitor for 0-6 h. ‘N’ indicates no washout, 
meaning that cells were treated with compound 
for the duration of the experiment (10 h). 

f, Antiproliferative effects of THZ1 are impervious 
to inhibitor washout. Jurkat cells were treated with 
THZ1 or THZ1-R in dose-response format for 
72h. Experiments were performed in biological 
triplicates. Error bars show = s.d. 
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that THZ1 indeed acts in an irreversible fashion (Fig. 2e, f and Extended 
Data Fig. 4a—e). We observed a loss of CAK activity, as evidenced by de- 
creased phosphorylation of the activation loops of CDK1, 2 and 9, indi- 
cating disruption of both recognized CDK7 signalling pathways in Jurkat 
cells (Fig. 2d and Extended Data Fig. 4f, g) and Loucy cell lines (Extended 
Data Fig. 4). Ectopic expression of doxycycline-inducible Flag-tagged 
CDK7 C3128, but not Flag-CDK7 wild type, in HeLa S3 cells restored 
RNAPII CTD phosphorylated (p)-Ser 5/7 to near wild-type levels at con- 
centrations of THZ1 up to 2.5 |1M, establishing C312 as a critical deter- 
minant of the cellular pharmacology of the inhibitor (Extended Data 
Fig. 5a, b). Additionally, Flag-CDK7 C312S expression restored CDK1/2 
T-loop phosphorylation, reduced early induction of cleaved poly-ADP 
ribose polymerase (PARP) and restored the expression ofa subset of genes, 
including the highly expressed transcription factors MYC, KLF4, ID1 and 
GATA2 (Extended Data Fig. 5c-e). The partial rescues of the hyperpho- 
sphorylated form of RNAPII (RNAPIIO) and RNAPII p-Ser 2 CTD phos- 
phorylation combined with the incomplete restoration of gene expression 
may result, in part, from lower-affinity cross-reactivity of THZ1 with CDK12 
and 13, which are bona fide Ser 2 kinases’. 

Our evidence that CDK7 inhibition leads to a reduction in RNAPII 
CTD phosphorylation status seems to be in conflict with evidence that 
inhibition of CDK7 alone is insufficient to reduce RNAPII CTD phos- 
phorylation in HCT116 cells’. It is possible that covalent inhibition and 
reversible inhibition can engender different effects on kinase structure; 
we did not find evidence that THZ1 affects TFITH or CAK complex stability 
(Extended Data Fig. 4h). It is also possible that inhibition of CDK12/13 (or 
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Figure 3 | THZ1 strongly reduces the proliferation and cell viability of 
T-ALL cell lines. a, THZ1 exhibits strong antiproliferative effects across a 
broad range of cancer cell lines from various cancer types. Cells were treated 
with THZ1 or DMSO vehicle for 72 h and assessed for antiproliferative effects 
using resazurin. b, Overexpression of transcriptional regulators, including 
(proto-) oncogenic transcription factors, is a strong predictor of cell-line 
sensitivity to THZ1. GO terms associated with overexpressed factors found in 
THZ1-sensitive cell lines. c, THZ1 shows a strong antiproliferative effect 
against T-ALL cell lines. BJ fibroblasts and RPE-1 cells are shown as normal cell 
lines. Cells were treated with THZ1 or DMSO vehicle for 72h. Experiments 
were performed in biological triplicates. Error bars show + s.d. d, THZ1 


another undetected kinase) contributes to reduced RNAPII CTD 
phosphorylation, although our evidence that RNAPII CTD phosphor- 
ylation levels are restored after expression of CDK7 C312S suggests 
otherwise. 

To understand better the breadth of antiproliferative activity of THZ1, 
we screened it against a diverse panel of over 1,000 cancer cell lines!®. 
THZ1 showed broad-based activity with half-maximum inhibitory 
concentration (ICs9) values less than 200 nM against 53% of the cell 
lines tested (Fig. 3a and Supplementary Table 4). Elastic net regression 
analysis incorporating gene expression, copy number and sequence 
variation genomics data’® across 527 of the cell lines tested were used to 
identify genomic features common to sensitive cell lines. Gene onto- 
logy (GO) term enrichment analysis’’ indicated a strong enrichment of 
(proto-) oncogenic transcription factors commonly overexpressed in 
cancer and factors involved in RNAPII-driven transcriptional regulation, 
suggesting that the dominant activity of THZ1 was through modulation of 
transcription (Fig. 3b and Supplementary Table 5). 

In agreement with the net elastic regression analysis, T-ALL cell lines 
that show characteristic misregulation of T-cell lineage-specific tran- 
scription factors were broadly sensitive to THZ1, but not to THZ1-R 
(Fig. 3c, Extended Data Fig. 6a and Supplementary Table 4). Treatment 
of T-ALL cell lines with THZ1 caused decreased cellular proliferation 
and an increase in apoptotic index with concomitant reduction in anti- 
apoptotic proteins, most notably MCL1 and XIAP (Extended Data Figs 6 
and 7). These strong antiproliferative responses induced at sub-effective 
doses of THZ1 suggest that T-ALL cells may be particularly sensitive to 
small perturbations in transcription and CDK7 kinase function. Indeed, 
THZ1 potently reduced the viability of patient-derived T-ALL and chronic 
lymphocytic leukaemia (CLL) cells (Extended Data Fig. 8a, b). Moreover, 
THZ1 exhibited efficacy in a bioluminescent xenografted mouse model 
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reduces the proliferation of KOPTK1 T-ALL cells in a human xenograft mouse 
model. Bioluminescent images of two representative mice treated with either 
vehicle control, 10 mgkg ' THZ1 once daily (qD), or 10 mgkg ' THZ1 twice 
daily (BID) for 29 days. e, Relative bioluminescence of mice treated with 
vehicle, 10mgkg  * THZ1 once daily, or 10 mgkg ' THZ1 twice daily during 
the 29 days of treatment. n = 10 for all groups. Bioluminescence is shown 
relative to day 0 and is plotted as average + standard error of the mean. 
Analysis of the bioluminescence data by repeated measures two-way analysis of 
variance (ANOVA) reveals that the antiproliferative effect of treatment with 
THZ1 twice daily is highly statistically significantly different (P < 0.0001) as 
compared with the other treatments. 


using the human T-ALL cell-line KOPTK1, when dosed twice daily at 
10mgkg ' (Fig, 3d, e, Extended Data Fig. 8 and Supplementary Table 6). 
Importantly, THZ1 was well tolerated at these doses with no observable 
body weight loss or behavioural changes (Extended Data Fig. 8f), sug- 
gesting that it caused no overt toxicity in the animals. These results were 
mirrored in cell culture with non-transformed BJ fibroblast and retinal 
pigment epithelial (RPE-1) cells responding to relatively high doses of 
THZ1 by undergoing cell-cycle arrest rather than initiating apoptosis or 
cell death, further suggesting that normal cells might tolerate transcrip- 
tional disruption (Extended Data Fig. 9). 

CDK7 is a component of the general transcription factor ITH (TFIIH) 
complex'*, so we next investigated how THZ1 treatment affects genome- 
wide gene expression. We chose Jurkat T-ALL cells for these studies because 
it is a well-studied T-ALL cell-line model with a defined core transcrip- 
tional regulatory circuitry consisting of key transcription factors, which 
is also found in human T-ALL primagrafts*. Treatment with 250 nM 
THZ1, but not THZ1-R, led to progressive reduction in global steady-state 
messenger RNA levels over time, with 75% and 96% of mRNAs show- 
ing greater than two-fold reduction by 6 and 12 h, respectively (Fig. 4a, 
Extended Data Fig. 10a and Supplementary Table 7). Consistent with 
global downregulation of mRNA transcripts, 250 nM THZ1 reduced 
RNAPII occupancy genome wide at both promoters and gene bodies 
(Fig. 4b). By comparison, Flavopiridol reduced RNAPII density only 
across gene bodies (Fig. 4b). This is consistent with the model that CDK7 
regulates RNAPII initiation and pausing whereas CDK9 regulates pause 
release leading to processive elongation®?'*?°7?"*, 

Although 250 nM THZ1 inhibits global transcription, we found that 
some cancer cell lines, particularly T-ALL, are sensitive to considerably 
lower concentrations of THZ1. We postulated that the expression of 
certain genes might be especially sensitive to low doses of THZ1 and 
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Figure 4 | THZ1 preferentially downregulates Jurkat core transcriptional 
circuitry. a, THZ1 treatment globally downregulates steady-state mRNA 
levels in a time-dependent manner. Jurkat cells were treated with THZ1 

(250 nM) for the indicated amounts of time. Heatmaps display the log, fold 
change in gene expression versus DMSO for the 21,970 transcripts expressed at 
12h in DMSO. b, THZ1 reduces RNAPII occupancy across promoters and 
gene bodies. Gene tracks of RNAPII ChIP-seq occupancy at RUNX1 after the 
indicated treatments (top). Metagene representation of global RNAPII 
occupancy at promoters and gene bodies (bottom). Average background 
subtracted chromatin immunoprecipitation followed by sequencing (ChIP-seq) 
signal in 22,310 genes expressed in 6 h DMSO conditions in units of reads 

per million per base pair (RPM bp '). Signal of ChIP-seq occupancy is in units 
of RPM per bin. All treatments were 6h with 250 nM of THZ1, THZ1-R or 
Flavopiridol. The metagene representation spans 2 kb upstream from the 
transcription start site (TSS) to 2 kb downstream from the transcription end 


therefore have a key role in driving the cellular response. Indeed, we 
found that transcripts for only a subset of genes were substantially 
affected by treatment with 50 nM THZ1, with that for RUNX1 among 
the most profoundly affected (Fig. 4c). There are at least two reasons that 
low-dose THZ1 treatment might cause a preferential loss of RUNX1 ex- 
pression. Tumour-cell oncogenes can acquire super-enhancers, which drive 
high-level expression yet can be especially sensitive to perturbation**. 
Super-enhancer analysis in Jurkat cells revealed that RUNX1 contains 
an exceptionally large super-enhancer domain containing a previously 
described haematopoietic-cell-specific enhancer (Fig. 4d, Extended Data 
Fig. 10b-d and Supplementary Table 8)”. In addition, RUNX1 forms a 
core regulatory circuitry with two additional transcription factors that 
have prominent roles in leukaemia biology, TAL1 and GATA3 (Fig. 4e)”. 
These factors autoregulate their own gene expression while simultaneously 
regulating many other genes that comprise the active gene expression pro- 
gram of Jurkat cells. Treatment with 50 nM THZ1 led toa significant reduc- 
tion in both the transcript and protein levels of RUNX1, TAL1 and GATA3 
(Extended Data Fig. 10e, f). Loss of the RUNX1-driven transcriptional 
program is probably key to the response to low-dose THZ1 treatment, as 
gene set enrichment analysis revealed that the Jurkat transcripts down- 
regulated by 50 nM THZ1 were enriched in transcripts similarly down- 
regulated after RUNX1 depletion using short hairpin (sh)RNA (Fig. 4f). 
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site (TES). c, THZ1 treatment delineates a subset of transcripts equally sensitive 
to low (50 nM) and high (250 nM) dose THZ1. Log, fold change in gene 
expression for 50nM (x-axis) and 250nM THZ1 (y-axis) after a 4h treatment. 
Pearson coefficient r = 0.50. d, Gene tracks of H3K27ac (top), CDK7 (middle) 
and RNAPII (bottom) ChIP-seq occupancy at the TSS, gene body, and a 
previously described enhancer region in the first intron of RUNX] (ref. 29). Total 
ChIP-seq signal is in units of RPM per bin. e, Positive interconnected 
autoregulatory loop formed by RUNX1, TAL] and GATA3. Genes are 
represented by rectangles, and proteins are represented by ovals”’. f, Transcripts 
downregulated by low-dose THZ1 are enriched for transcripts downregulated 
after RUNX1 knockdown. Gene set enrichment analysis of top 500 transcripts 
downregulated after a 4h treatment with THZ1 (50 nM) in comparison to 
transcripts following a RUNX1 knockdown”'. Gene set enrichment analysis- 
supplied P value < 0.001. 


We have reported the discovery and characterization ofa covalent in- 
hibitor of CDK7, THZ1. THZ1 uses a unique mechanism, combining 
ATP-site and allosteric covalent binding, as a means of attaining potency 
and selectivity for CDK7. This mechanistic insight should be useful for 
designing next-generation inhibitors of CDKs, for which high sequence 
and shape homology in the ATP pocket has posed a formidable chal- 
lenge to achieving selectivity with conventional ATP-competitive inhi- 
bitors. THZ1 showed potent antiproliferative activity on T-ALL cell 
lines and other blood cancers, in which oncogenic transcription factors 
feature prominently in the disease state. In Jurkat cells, low-dose THZ1 
had a profound effect on a small subset of genes, including the key regu- 
lator RUNX1, thus contributing to subsequent loss of the greater gene 
expression program and cell death. Identification of additional cancer 
cell lines whose gene expression programs show vulnerability to THZ1 
or other transcriptional inhibitors should delineate other cancers that 
are susceptible to perturbation of transcription. 


METHODS SUMMARY 

T-ALL culture conditions. Jurkat, Loucy, KOPTK1 and DND-4I cell lines were 
grown in RPMI-1640 supplemented with 10% fetal bovine serum (FBS) and 1% 
glutamine. All cell lines were cultured at 37°C in a humidified chamber in the 
presence of 5% COs, unless otherwise noted. 
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Inhibitor treatment experiments. Time-course experiments such as those described 
in Extended Data Fig. 5a were conducted to determine the minimal time required for 
full inactivation of CDK7. Cells were treated with THZ1, THZ1-R or dimethylsulph- 
oxide (DMSO) for 0-6 h to assess the effect of time on the THZ1-mediated inhibition 
of RNAPII CTD phosphorylation. For subsequent experiments cells were treated 
with compounds for 4h as determined by the time-course experiment described 
earlier, unless otherwise noted. For inhibitor washout experiments (Fig. 2e, f and 
Extended Data Fig. 5) cells were treated with THZ1, THZ1-R or DMSO for 4h. 
Medium containing inhibitors was subsequently removed to effectively ‘washout’ 
the compound and the cells were allowed to grow in the absence of inhibitor. For 
each experiment, lysates were probed for RNAPII CTD phosphorylation and other 
specified proteins. 
High-throughput cell-line panel viability assay. Cells were seeded in 384-well 
microplates at ~15% confluency in medium with 5% FBS and penicillin/streptavidin. 
Cells were treated with THZ1 or DMSO for 72h and cell viability was determined 
using resazurin. 
RNA extraction and synthetic RNA spike-in. Total RNA and sample prepara- 
tion was performed as previously described”°. Briefly, after inhibitor treatment cell 
number was determined, total RNA was isolated, and ERCC RNA Spike-In Mix 
(Ambion, catalogue no. 4456740) was added to total RNA relative to cell number. 
Expanded protocols and synthetic chemistry schemes can be found in Supple- 
mentary Information. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | THZ1 demonstrates time-dependent inhibition of 
CDK7 in vitro and covalent binding of intracellular CDK7. a, THZ1 but 
not THZ1-R shows time-dependent inhibition. LanthaScreen Eu Kinase 
Binding Assay was conducted at Life Technologies in a time-dependent 
manner (20, 60 and 180 min), demonstrating that THZ1 but not THZ1-R 
shows time-dependent inhibition of CDK7. b, c, Pre-incubation of THZ1 
increases CDK7 inhibitory activity in vitro. Recombinant CAK complex was 
incubated with THZ1 (b) or THZ1-R (c) in a dose-response format with or 
without pre-incubation before ATP (25 UM) addition. The kinase reaction was 
then allowed to proceed for 45 min at 30 °C. d, Workflow of bio-THZ1 pull- 
down competition experiment. e, Bio-THZ1 pulls down CDK7 from cellular 


lysates. Loucy cellular lysates were incubated with bio-THZ1 (1 1M) with or 
without THZ1 (10 uM) and streptavidin-precipitated proteins were probed for 
CDK7. IB, immunoblot. f, Free intracellular THZ1 competes in a dose- 
dependent manner for bio-THZ1 binding to CDK7. Loucy cells were treated 
with increasing concentrations of THZ1 or with 10 1M THZ1-R for 4h. 
Cellular lysates were incubated with bio-THZ1 and processed as 

indicated in a. g, Bio-THZ1 labels CDK7 in lysates. Loucy cellular lysates 
were incubated with bio-THZ1 at 4°C for 12h followed by 
immunoprecipitation of CDK7 at 4°C for 3h. Precipitated proteins were 
washed and probed with horseradish peroxidase (HRP)-conjugated 
streptavidin. 
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Extended Data Figure 2 | THZ1 covalently binds CDK7 C312. a, b, Total ion 
chromatograms (TIC) and extracted ion chromatograms (XIC) for CDK7 
peptides recorded during analysis of CAK complexes treated with DMSO (a) or 
THZ1 (b). c, Efficiency of labelling was estimated to be approximately 85%, as 
gauged by the reduction in signal of triply and quadruply charged 
YFSNRPGPTPGCQLPRPNCPVETLK ions (residues 294-318). The peptides 
VPFLPGDSDLDQLTR (residues 180-194) and LDFLGEGQFATVYK 
(residues 15-28) were used for normalization. d, Orbitrap HCD tandem mass 
spectrometry (MS/MS) spectrum of a quadruply charged CDK7-derived 
peptide (residues 294-318) labelled by THZ1 at C312. Fragment ions 
containing the peptide C terminus (y-type) or N terminus (b-type), along with 
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the associated mass errors are shown in red and blue, respectively. Fragment 
ions marked by an asterisk contain the inhibitor and have the expected 
heavy isotope contribution from chlorine. The site of labelling was determined 
to be C312 (as opposed to C305) on the basis of fragment ions observed in 
additional MS/MS spectra (for example, y11°* observed with <3 p.p.m. mass 
error by fragmentation of the +6 charged precursor; see inset mass spectrum). 
e, C312S mutation eliminates THZ1 covalent binding. Cellular lysates from 
HCT116 cells expressing either Flag~CDK7 wild type or C312S were incubated 
with bio-THZ1 for 12h at 4°C and then at room temperature for 3h to 
facilitate covalent binding. Precipitated proteins were then probed for the 
presence of Flag-tagged CDK7. 
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Extended Data Figure 3 | THZ1 inhibits CDK12 but at higher 
concentrations compared with CDK7. a, Protein sequence alignment of the 
C-terminal regions of all human (hs) CDKs and mouse (m) CDK7 using 
Uniprot default settings. Note that the canonical cell-cycle CDKs 1, 2 and 4, as 
well as 5, do not have C-terminal domains that extent to the equivalent position 
of CDK7 C312 and therefore do not show aligned sequence in this region. 

b, Bio-THZ1 covalently pulls down CDK7 from cellular lysates. Jurkat cellular 
lysates were incubated with bio-THZ1 (1 1M) at 4°C for 12h and for 2h at 
room temperature. Precipitated proteins were washed with or without urea 
(4M), here used as a denaturing agent, and probed for the indicated CDKs. 
c, Bio-THZ1 pulls down Flag~CDK12 from lysates. Lysates from 293A cells 
stably expressing Flag-tagged wild-type CDK12 were incubated with bio-THZ1 
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(1 tM) at 4°C for 12h and for 2h at room temperature. Immunoprecipitated 
proteins were probed with Flag antibody to recognize CDK12 or with CDK7 
antibody. d, Bio-THZ1 pulls down cyclin K from cellular lysates. Jurkat cellular 
lysates were incubated with bio-THZ1 (1 1M) at 4°C for 12h and for 2h at 
room temperature. Precipitated proteins were probed for the indicated 
proteins. e, THZ1 inhibits CDK12 in an in vitro kinase assay. 293A cells 
stably expressing Flag-tagged wild-type CDK12 were treated with THZ1 or 
THZ1-R for 4h. Exogenous CDK12 was immunoprecipitated from cellular 
lysates using Flag antibody. Precipitated proteins were washed and subjected 
to in vitro kinase assays at 30 °C for 30 min using the large subunit of 
RNAPII (RPB1) as substrate and 25 uM ATP. CS, Coomassie stain. 

f, Quantification of in vitro kinase assay conducted in d. 
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Extended Data Figure 4 | THZ1 irreversibly inhibits RNAPII CTD and 
CAK phosphorylation. a, THZ1 exhibits time-dependent inactivation of 
intracellular CDK7. Loucy cells were treated with THZ1 or THZ1-R for 0-4h. 
At each time point, cells were harvested, lysed and the cellular lysates were 
probed with antibodies against the specified proteins. b, THZ1 inhibits RNAPII 
CTD phosphorylation. Loucy cells were treated with THZ1 or THZ1-R for 4h. 
Cellular lysates were then probed with antibodies recognizing the Ser 2, Ser 5 
and Ser 7 CTD RNAPII phospho-epitopes. c, Loucy cells were treated with 
THZ1 or THZ1-R for 4h followed by washout of inhibitor-containing medium. 
Cells were allowed to grow in medium without inhibitor for 0-6 h. At each time 
point cells were lysed and the cellular lysates were probed with antibodies 
against the specified proteins. ‘N’ indicates cells for which medium was never 
washed out. d, Apoptotic signalling is maintained despite washout of THZ1. 
Loucy cells were treated with THZ1 or THZ1-R for 4h followed by washout of 
inhibitor-containing medium, at which point cells were allowed to grow in 
medium with or without inhibitor for 0-48 h. At each time point, cells were 
lysed and the cellular lysates were probed with antibodies against the specified 
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proteins. e, Antiproliferative effects of THZ1 are impervious to inhibitor 
washout. Loucy cells were treated with THZ1 or THZ1-R in a dose-response 
format for 72h. Antiproliferative effects were determined using CellTiter-Glo 
analysis. f, THZ1 reduces the T-loop phosphorylation status of CDK1 and 
CDK2 in Jurkat cells over a 3h exposure. Asynchronous cells were treated with 
increasing concentrations of THZ1 or THZ1-R for 3h. Cellular lysates were 
then probed with antibodies against the indicated proteins or phosphoproteins. 
g, THZ1, but not THZ1-R, completely inhibits T-loop phosphorylation of 
CDK1 and CDK2 after treatment over one cell cycle. Loucy cells were treated 
with THZ1, THZ1-R, Flavopiridol or DMSO vehicle at the indicated 
concentrations for 24 and 14h, respectively (roughly one cell cycle). Cell lysates 
were harvested and probed with antibodies against the specified proteins or 
phosphoproteins. h, HeLa $3 cells stably expressing Flag-tagged wild-type 
CDK7 were treated with THZ1 (1 1M) or DMSO vehicle for 5 h with or without 
the presence of doxycycline. Proteins were immunoprecipitated using Flag 
antibody. Precipitated proteins were probed using the indicated antibodies. 
Asterisk indicates heavy chain from IgG antibody. 
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Extended Data Figure 5 | Mutation of CDK7 C312 to serine rescues Ser 5/7 
and partially rescues Ser 2 RNAPII CTD phosphorylation. a, Expression 
of C312S rescues Ser 5/7 and partially rescues Ser 2 RNAPII CTD 
phosphorylation. HeLa S3 cells stably carrying a doxycycline-inducible Flag- 
CDK7 C312S construct were treated with THZ1 or DMSO for 5h with or 
without the presence of doxycycline. Cellular lysates were then probed for the 
indicated proteins. b, Phenotypic rescue is specific to the C312S mutation, as 
rescue is not achieved with overexpression of Flag-CDK7 wild type (WT). 
HeLa S3 cells stably carrying doxycycline-inducible Flag~CDK7 wild-type and 
C312S constructs (or empty vector) were treated with THZ1 or DMSO 

for 5h in the presence of doxycycline. c, Expression of C3128 largely restores 
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CDK1/2 T-loop phosphorylation. HeLa S3 cells stably carrying a doxycycline- 
inducible Flag-CDK7 C312S construct were treated with THZ1 or DMSO for 
5h with or without the presence of doxycycline. Cellular lysates were then 
probed for the indicated proteins or phosphoproteins. d, Overexpression of 
Flag-CDK7 C312 rescues the expression of a subset of transcripts in HeLa S3 
cells. Log, fold change in gene expression in HeLa S3 cells expressing Flag— 
CDK7 wild type (x-axis) and Flag-CDK7 C312S (y-axis) after a 4h treatment 
with 500nM THZ1. e, GO molecular function analysis of transcripts 
increased by 1 log, order or more after expression of Flag~CDK7 C312S 
compared with Flag—CDK7 wild type in the presence of 500 nM THZ1. 
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Extended Data Figure 6 | THZ1 potently disrupts T-ALL proliferation. (bottom) T-ALL cells were treated with THZ1 for the indicated time 
a, THZ1, but not THZ1-R, exhibits strong antiproliferative effects against periods. Cell-cycle progression was assessed using FACS cell-cycle analysis. 


T-ALL cell lines. Cells were treated with THZ1, THZ1-R or DMSO vehicle for 2N= G1, 4N= G2. c, Treatment with THZ1 decreases CDK1/2 T-loop 
72h and assessed for antiproliferative effect by CellTiter Glo analysis. Error phosphorylation. Jurkat cells were incubated with THZ1 for the indicated 
bars show + s.d. b, THZ1 causes cell-cycle arrest. Jurkat (top) and Loucy duration of time and lysates were probed for the specified proteins. 
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Extended Data Figure 7 | Treatment with THZ1 induces apoptosis in 
T-ALL cells. a, Representative annexin V and propidium iodide stainings for 
Jurkat cells incubated with THZ1 for the indicated amount of time and 
harvested to determine the percentage of apoptotic and/or dead cells by 
annexin V and propidium iodide staining, respectively. The percentage of cells 
in each cell population is shown in the four quadrants. b, Treatment with THZ1 
induces apoptosis. Quantification of annexin V and propidium iodide staining 
data from a. Experiments were performed in biological triplicates. Error bars 
show + s.d. c, Representative annexin V and propidium iodide stainings for 
Loucy cells incubated with THZ1 for the indicated amount of time and 


12 24 36 48 hrs. 


harvested to determine the percentage of apoptotic and/or dead cells by 
annexin V and propidium iodide staining, respectively. The percentage of cells 
in each cell population is shown in the four quadrants. d, Treatment with THZ1 
induces apoptosis. Quantification of annexin V and propidium iodide staining 
data from c. Experiments were performed in biological triplicates. Error bars 
show + s.d. e, f, Sustained treatment with THZ1 induces apoptosis coincident 
with loss of RNAPII CTD phosphorylation and a reduction in anti-apoptotic 
proteins. Jurkat (e) and Loucy (f) cells were incubated with THZ1 for the 
indicated duration of time and lysates were probed for the specified proteins. 
Apoptosis was monitored by PARP cleavage. 
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Extended Data Figure 8 | THZ1 demonstrates potent killing of primary 
chronic lymphocytic leukaemia cells and antiproliferative activity against 
primary T-ALL cells and in vivo against a human T-ALL xenograft. 

a, Patient-derived chronic lymphocytic leukaemia (CLL) samples were 
obtained and cultured in vitro for 24 hours in the presence of escalating doses 
of the specified compounds (n = 10 samples, 1 technical replicate per 
condition, per sample). Cell death upon compound exposure was evaluated 
using FITC Annexin V Apoptosis Kit I (BD Biosciences) and 1x10* events were 
collected and analysed using a BD FACSCanto II flow cytometer. Results shown 
are mean normalized percentage death based on Annexin V and propidium 
iodide single- and double-positive cells (+ s.d.) normalized to baseline death in 
the vehicle (DMSO) control condition of each sample. Compounds tested were 
THZ1, THZ1-R and Flavopiridol (THZ1 versus THZ1-R P=1.5X 10 **; 
THZ1 versus Flavopiridol P = 0.05). P values were generated using an ANOVA 
model. b, Patient-derived xenografts (patient IDs 3255-1, M18-1-5 and D135- 
1-5; n = 3) were treated with THZ1 for 3 h followed by compound washout. An 
aliquot of input cells was then counted by flow cytometry using a known 
quantity of flow cytometry calibration beads (data not shown; Molecular 
Probes). The remaining cells were plated onto MS5-DL1 feeder cells in the 
presence of serum-free media (supplemented with 0.75 1M SRI, 10ng ml! 
interleukin (IL)-7, 10 ng ml! IL-2). Seventy-two hours later, cultures were 


LETTER 


harvested by vigorous pipetting with Trypsin, filtered through nylon mesh to 
deplete feeders, and counted by flow cytometry using a known quantity of 
flow cytometry calibration beads and with gating to discriminate between 
T-ALL cells and carryover feeders. The final cell number was normalized to the 
input cell number to calculate fold expansion. This experiment was performed 
once per patient-derived sample. c, Bioluminescent images of two 
representative mice treated with either vehicle control, 10 mgkg' THZ1 once 
daily (qD), or 10 mgkg' THZ1 twice daily (BID) for the indicated number 
of days. d, Spleen tissue from mice treated with THZ1 shows decreased RNAPII 
CTD phosphorylation. Mice were treated with THZ1 10 mgkg ' once daily 
or twice daily or vehicle control. The animals were killed and spleen tissues were 
isolated. Lysates prepared from homogenized spleen tissue were probed for 
RNAPII CTD phosphoepitopes. e, THZ1 binds directly to CDK7 in mouse 
tissues. Mice were treated with THZ1 10 mgkg ' once daily or twice daily or 
vehicle control. The animals were killed and spleen tissues were isolated. 
Lysates prepared from homogenized spleen tissue were incubated with bio- 
THZ1 for 12h at 4°C and 2h at room temperature to induce covalent 

bond formation. Proteins pulled down were then probed for the presence of 
CDK7. f, Body weights of mice treated with either vehicle control, 10mgkg * 
THZ1 once daily, or 10 mgkg ' THZ1 twice daily over the duration of the 
drug treatment. 
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Extended Data Figure 9 | THZ1 inhibits RNAPII CTD phosphorylation 
and causes cell-cycle arrest in non-transformed cell lines. a, b, THZ1 inhibits 
RNAPII CTD phosphorylation. RPE-1 (a) and BJ fibroblasts (b) were treated 
with THZ1 or THZ1-R for 4 h. Cellular lysates were then probed with 
antibodies against the indicated proteins. c, d, THZ1 causes cell-cycle arrest in 
non-transformed cells. RPE-1 (c) and BJ fibroblast (d) cells were treated with 
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THZ1, Flavopiridol, Staurosporine or DMSO vehicle for the indicated time 
periods. Cell-cycle progression was analysed after permeabilization and 
staining with propidium iodide. e, f, THZ1 inhibits proliferation of non- 
transformed cell lines. RPE-1 (e) and BJ fibroblast (f) cells were treated with 
THZ1, THZ1-R, Flavopiridol or Staurosporine for 72 h and antiproliferative 
effect were determined by CellTiter Glo. Error bars show + s.d. 
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Extended Data Figure 10 | High-dose THZ1 reduces global steady-state 
mRNA levels, but low-dose THZ1 preferentially downregulates components 
of the TALI/RUNX1/GATA3 transcriptional circuit. a, THZ1, but not 
THZ1-R, causes global downregulation of steady-state mRNA levels. Jurkat 
cells were treated with THZ1 (250 nM) or THZ1-R (250 nM) for 4h. Total 
RNA was isolated and ERCC spike-in controls were added relative to cell 
number and analysed using Affymetrix PrimeView microarrays. Heatmaps 
displaying the log, fold change in gene expression versus DMSO for 22,310 
genes expressed in DMSO conditions at 6h in THZ1 or THZ1-R. b, Total 
H3K27ac ChIP-seq signal (length < density) in enhancer regions for all 
stitched enhancers in Jurkat. Enhancers are ranked by increasing H3K27ac 
ChIP-seq signal. c, d, Gene tracks of H3K27ac (top), CDK7 (middle) and 
RNAPII (bottom) ChIP-seq occupancy at the TSS, gene body and enhancer 
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regions of TALI (c) and MYB (d). e, THZ1 downregulates mRNA transcripts 
of the TALI/RUNX1/GATAS transcriptional circuitry. Quantitative 
polymerase chain reaction with reverse transcription (RT-qPCR) expression 
analysis in Jurkat cells of transcripts identified as downregulated after THZ1 
treatment. qPCR was carried out using Taqman probes according to the 
manufacturer’s protocol. All experiments shown were performed in biological 
triplicate with each individual biological sample qPCR amplified in 

technical triplicate. Expression was normalized to ACTB, and fold change in 
expression was calculated relative to DMSO. Error bars show = s.d. 

f, THZ1 treatment reduces the protein levels of TAL1/RUNX1/GATA3 
transcriptional circuitry. Jurkat cells treated with THZ1 for the indicated 
time points were probed for the specified proteins. 
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PATENT LAW 


Finding a balance 


Scientists who decide to pursue a legal career can enjoy fresh 
challenges while staying connected to the research world. 


BY CAMERON WALKER 
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fter Jason Rutt filled out a career- 
Aw survey as a teenager, he 

was given three job options: soldier, 
farmer or patent attorney. At the time, says 
Rutt, all he knew about patents was that 
the Nobel-prizewinning physicist Albert 
Einstein had worked in a patent office. It was 
not until years later, while pursuing a PhD in 
synthetic organic chemistry at the University 
of Nottingham, UK, that Rutt thought about 
patents again. 

He spent six months of his PhD in the 
Nottingham laboratories of Boots, the inter- 
national pharmacy and health-and-beauty 
chain. While there, he learned about the 
company’s other departments, including 
the patent division, where attorneys secured 
patents for inventions and protected existing 
intellectual property from challenges. Rutt 
later returned to Boots to do research, but 
by then his desire to continue in research 
and development was waning. He decided 
to change tack: as a patent attorney, he real- 
ized, he would not only help clients to secure 
and protect intellectual property, but would 
also be able to work closely with researchers 
in many fields of science. 


EASY SWITCH 
The transition to law was fairly simple for 
Rutt, because he did not need a law degree, 
just on-the-job training and professional 
qualifications. By coincidence, a trainee 
position opened up in the patent department 
at Boots at the same time that Rutt decided 
he was interested in the field. He got the job. 
Now, as head of patents in the London office 
of the international intellectual-property 
firm Rouse, he works with pharmaceutical 
companies and start-up ventures in areas 
such as gene therapy and diagnostics. 

Scientists move away from hands-on 
research and into the legal field for a variety 
of reasons. Some have a long-standing inter- 
est in law and policy; others believe that a 
career in law will allow them to be stronger 
advocates for research than they could be as 
scientists. Some consider a career in patent 
law because it dovetails with many scientific 
fields and will enable them to remain close 
to the research world. Patent attorneys may 
work directly with researchers to learn more 
about a client’s techniques and inventions — 
they often hear about the latest ideas long 
before the research is published. 

A patent attorney is not the only option for 
scientists who want to enter the legal field. > 
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> Researchers with health or epidemiology 
backgrounds may choose to work as a legal 
counsel for public-health or regulatory agen- 
cies as the organizations apply for medical- 
research funding or examine bioethical 
issues. Graduates in environmental science 
or ecology may find positions helping gov- 
ernment agencies to develop energy regula- 
tions or petitioning for endangered status for 
various species on behalf of environmental 
non-profit organizations (see ‘From the lab 
to the law’). 

While studying the ecology of artificial 
reefs as part of a master’s degree in environ- 
mental management, Margaret Peloso took 
classes in environmental law. That made her 
realize that she wanted to use her scientific 
background to develop policy, and she went 
on to pursue a PhD that combined science 
and policy while getting her law degree. She 
now focuses on environmental law and cli- 
mate change in the Washington DC office 
of Vinson & Elkins, an international law 
firm specializing in energy and finance. 
Her work involves developing the firm’s 
climate-change practice — she researches 
legal aspects of climate change and educates 
clients about potential risks and benefits, 
which fits in with her PhD on legal and pol- 
icy issues surrounding sea-level rise. 


WORLD CLASS 

Qualifying as a legal professional varies from 
country to country. In the United Kingdom, 
prospective lawyers usually take an under- 
graduate degree in law. People who already 
have a degree — including one in science — 
can take a one-year postgraduate diploma 
in law. In both cases, extra coursework 
and training is required before being able 
to practise as a barrister or solicitor. But a 
law degree is not required for people with 
science and engineering backgrounds who 
want to become patent attorneys or to bring 
inventions to the European Patent Office 
(EPO). Instead, they do several years of on- 
the-job training, then sit national or EPO 
examinations. 

In the United States, most of those wishing 
to become a practising attorney must gain 
an undergraduate degree and a law degree. 
After receiving their juris doctor (JD), they 
must then pass a state bar exam. (In a hand- 
ful of states, it is possible to qualify to take the 
state bar exam after completing a lengthy legal 
apprenticeship.) 

Yet the United States still has entry routes 
that do not require a law degree. Law firms 
often hire people with strong science back- 
grounds as technical specialists to help 
the firm to prepare patent applications for 
its technologies. These ‘tech specs’ often 
work directly with researchers and inven- 
tors to learn about their work, examine 
the scientific literature to find out whether 
similar techniques or ideas have already 
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FROM THE LAB TO THE LAW 


A science background opens doors in the legal world 


Gene patenting ©: 


US Endangered Species Act ‘@ 70° 


Shifting from a science-focused track to a 
legal career can be a bit bumpy, from the 
challenges of learning the language of law 
to studying for examinations. But it will be 
smoother for those who have a clear view of 
their target. Attorney Collette Adkins Giese in 
Minneapolis, Minnesota, had initially wanted 
to make an impact in the conservation 
world through research and teaching. 

After earning a master’s degree in wildlife 
conservation and lecturing for several years, 
Giese realized that she wanted to be an 
advocate for environmental research. 

So she returned to the University of 
Minnesota’s Twin Cities campus for a 

joint juris doctor law degree and PhD 
programme in conservation biology. 

She spent her summer holidays as an 
intern at Earthjustice, an environmental- 
law organization, and for non-profit 
conservation groups, including Defenders 
of Wildlife in Washington DC. There, Giese 
made valuable connections and established 
her commitment to the legal and non- 
profit sectors — a double benefit because, 
she says, non-profits are often reluctant 

to hire newly minted lawyers. Law-school 
graduates do not always leave school with 
practical skills and could need on-the-job 
training, for which non-profit organizations 
may not have resources. 

But after earning her law degree in 2005, 
she clerked for a judge who presided over 
criminal and contract issues, as well as 
some environmental law. She honed her 
writing skills by drafting the judge’s orders 
and opinions. Then she spent several years 
at a large corporate law firm in Minneapolis 
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that had an established training 
programme for associates. With the firm, 
she worked on behalf of the plaintiffs in the 
1989 Exxon Valdez oil spill and represented 
environmental non-profit organizations 
through the firm’s pro bono programme. In 
her current role as a senior attorney for the 
Center for Biological Diversity in Tucson, 
Arizona, she focuses on protecting reptiles 
and amphibians. 
For Giese, working as an attorney is 
the best way for her to use her scientific 
knowledge. “I really love the law, and | see 
how it changes and can make an impact. 
Important things get decided in courts,” she 
says, “and | just want to be a part of that.” 
Another scientist-turned-lawyer is 
Heriberto Moreno. Moreno became 
interested in intellectual property as an 
undergraduate student at the University of 
Puerto Rico, where he had taken an elective 
class in technology transfer. 
The class had given him the idea of 
training as a patent attorney. Towards 
the end of his microbiology PhD at the 
University of Virginia in Charlottesville, he 
perused the university’s alumni database to 
find scientists-turned-patent attorneys and 
contacted them to learn more about their 
jobs. He also visited the school’s career 
centre and looked online to find law firms 
that offer programmes and positions for 
technical specialists. 
One that captured his interest was 
an intellectual-property firm based 
in Washington DC. Moreno contacted 
the head of the firm’s biotechnology 
division and ended up working there as 
a technical specialist on patents in areas 
including nutraceuticals — products such 
as dietary and herbal supplements — 
pharmaceuticals and green technology. 
Over time, he took on more responsibility 
in drafting patent applications and realized 
that the next logical step was to become an 
attorney. 
Moreno is now in his final year at 
Boston University’s School of Law in 
Massachusetts. He also works as a 
technical specialist in areas including 
pharmaceuticals and biologics at the 
Boston office of the law firm McCarter 
& English. He has had to adapt to being 
older than most of his fellow students and 
managing the challenges of full-time study. 
But his mindset is different now that he has 
had a taste of his future career as a patent 
attorney. One of the best parts of patent law, 
he says, is its variety. “You just never know 
what you'll be working on.” C.W. 
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been published and determine whether 
an innovation overlaps with technologies 
that have already been patented. Research- 
ers who pass the US Patent and Trademark 
Office registration exam can then develop 
and file patent applications — although 
they cannot advise clients on legal issues 
or go to court if a patent is infringed. 

If a scientist earns a JD and passes both 
patent and state bar exams, she or he can 
work in patent law and develop a full- 
service intellectual-property practice, 
which may involve working with clients 
on trademarks, copyrights and technology 
licensing. Scientist-attorneys can also help 
clients, particularly start-up firms, with 
issues such as entity formation, employ- 
ment agreements and general legal ser- 
vices, and can practise in other fields of law 
as well. In some cases, employers will reim- 
burse tuition fees or provide other forms of 
support to tech specs or patent agents who 
attend law school while working. 


BACK TO SCHOOL 

The idea of returning to higher educa- 
tion can be daunting for an early-career 
researcher, says Dianne Nicol, a law pro- 
fessor and deputy director of the Centre 
for Law and Genetics at the University of 
Tasmania in Hobart, Australia. But, she 
says, the extra training can help to develop 
one’s career. 

After earning a PhD in cell biology from 
Dalhousie University in Halifax, Canada, 
and a law degree in Tasmania, Nicol spent 
several years in a private legal practice, 
working on intellectual property as well 
as on contract work and personal-injury 
litigation. She now researches and writes 
about issues such as gene patenting, the 
privacy of genetic information, regula- 
tions underlying biobanking and direct-to- 
consumer genetic testing. Her research has 
been used to inform government reports 
on genetic privacy and health issues sur- 
rounding genetic patenting, and she was 
recently appointed to a three-member 
panel that reviews pharmaceutical patents. 
Nicol relies heavily on both her science 
background — from the fundamentals of 
genetics she learned as an undergraduate to 
her postgraduate training — and her legal 
education. “Even though it sounds like 
a long and laborious process, it’s worth- 
while,” she says. 

The lingering effects of the global reces- 
sion have meant that job prospects for 
fledgling US lawyers is gloomy: the Ameri- 
can Bar Association announced this year 
that just 57% of the 46,776 people who 
received a JD in 2013 — the largest number 
of new JDs ever — had found long-term, 
full-time jobs in law that had required 
them to pass the bar examination. But the 
news is not all bad. People with science 


backgrounds may have an edge on their 
peers, both in terms of law-school admis- 
sion and in finding gainful employment 
after graduating, says Joy Baker Peacock, 
assistant director of the High Tech Law 
Institute at Santa Clara University School 
of Law in California. Many law schools and 
institutes are keen on training students 
who have physics, engineering and com- 
puter-science backgrounds — once quali- 
fied, such candidates may be appealing to 
employers in areas including semiconduc- 
tors, photovoltaics and nanotechnology. 

Peacock says that attorneys with a PhD in 
the life sciences are valuable in the biotech- 
nology patent-prosecution field because 
they will have the necessary knowledge 
to work with clients and the officials on 
protecting intellectual property. Although 
litigators — attorneys who usually work 
for plaintiffs and defendants in patent- 
infringement cases — do not require a 
science background, she says, it can be an 
advantage. “Firms like to hire people that 
have a strong grounding in technical and 
scientific matters so that they will be able 
to get up to speed more quickly on the tech- 
nical aspects of pat- 
ent-infringement 
cases.” 

Combining sci- 
ence and law can 
be a way to chan- 
nel a passion for 
science into real- 
world impact and 
personal satisfac- 
tion. “I know sci- 
entists strive to 
be unbiased,” says 
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mals and species 
for the non-profit 
Center for Biologi- 
cal Diversity, based 
in Tucson, Arizona. In 2012, Giese filed the 
largest-ever petition involving reptiles and 
amphibians to the US Fish and Wildlife 
Service to protect 53 species under the US 
Endangered Species Act. It was a 450-page 
document that involved nearly a year of 
preparation, including literature reviews 
and discussions with experts. 

As a lawyer, she is now free to be an 
advocate for her interests. “You can't hide 
behind the notion that you're just present- 
ing the facts anymore,” she says. “For me, 
that was sucha relief” = 


part of that.” 
Collette Adkins Giese 


Cameron Walker is a freelance writer in 
Santa Barbara, California. 
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EDUCATION 


Graduate skills survey 


The Council of Graduate Schools in 
Washington DC is examining the 
professional-development requirements 
of PhD and master’s students in science, 
technology, engineering and mathematics 
(STEM) programmes. The council will 
survey 500 member institutions and 
interview industry leaders to determine 
which skills are most important for STEM 
graduates and which remain unaddressed 
in US graduate programmes. Daniel 
Denecke, the council’s associate vice- 
president for programmes, says that the 
study, which is funded by a US$298,100 
grant from the National Science 
Foundation, is focusing on industrial 
employers because they are the most likely 
to hire STEM graduates. Results will be 
available by summer 2016. 


FUNDING 


Marion Mason award 


A US$2.2-million bequest from the estate 
of avenerated US chemist will support 
early-career female chemists over the next 
20 years. Recipients of the Marion Milligan 
Mason Award for Women in the Chemical 
Sciences will receive $50,000, which 

may be used for laboratory supplies and 
equipment; publication costs; computer 
and technical support; and attendance at 
meetings. Applicants must have a tenure- 
track post at a US PhD-granting institution 
and must be US-born, naturalized citizens 
or permanent residents. The awards are 
administered by the American Association 
for the Advancement of Science in 
Washington DC. Applications are due 

by 15 September, and winners will be 
announced by May 2015. 


FINANCIAL OUTLOOK 
Continued squeezes 


US universities are likely to face continued 
financial pressures over the next 

12-18 months, says a report by Moody’s 
Investors Service in New York. Negative 
Outlook for US Higher Education Continues 
Even as Green Shoots of Stability Emerge 
predicts that competition for tuition 
revenue, federal grants and state funding 
will affect regional public universities 
most; prominent private universities with 
large endowments will perform well. The 
negative outlook means that Moody’s is 
more likely to give US universities poor 
credit ratings; as a result, they will incur 
higher borrowing costs and may have to 
cut back on hiring. 
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Uae SCIENCE FICTION 


ONE OUT, ONEIN 


BY AISLINN BATSTONE 


nitta and her granddaughter walked 
A* avenue between snow-topped 
graves on their way to the church 
for Mass. Copenhagen’s Assistens Cemetery 
seemed an appropriate place for a conversa- 
tion like this, even if Karin was being obstinate. 

“Mormor, there's no way I'll allow you to 
do it;’ Karin said. 

“I want to. You're ready to become a 
mother?” 

“Mathias and I might adopt. If not, we'll 
wait in the queue. I have frozen eggs.” 

“Oh, a firstborn at age 70? That makes 
sense. Like me being as healthy as a 60-year- 
old” 

“Good. You'll live for another 40 years.” 

“Not if it denies you children. I had three 
babies, you know.” Christian, so fat with 
his dark hair and high colour. Nikolaj with 
his thoughtful eyes and the way he always 
wanted to be close to her. And Karin’s 
mother Josefine, a noisy child whose shrieks 
made strangers smile. 

“Tt’s a different world, Mormor”” 

“Not a different world. Just different ways.” 

Somewhere nearby, the philosopher and 
theologian Soren Kierkegaard lay sleeping in 
the frozen earth. A scattering of snow deco- 
rated the evergreens and the bare branches. 
Through the trees they glimpsed the green 
copper spire of the Holy Cross Church. 

“The only thing I wish,” Anitta said, “is 
that I could see you with your child. I picture 
it all the time. You in the hospital bed with 
your baby in your arms.” 

“One day Mathias and I will have a baby. 
Maybe one who really needs us. Then you 
will have your wish” 

Anitta shook her head. There were plenty 
of couples and not enough orphans. Preg- 
nancy, birth, feeding a baby with her own 
body. Why should Karin miss out because 
her own generation was too selfish to die? 
“We wont talk about it anymore. But if I 
die, my ‘one out, one in’ goes to you.” Such a 
request was legally binding, she'd made sure 
of that. 

“Thank you,’ Karin said, clearly relieved 
that the conversation was over. 


Karin felt sick to her stomach the day she had 
her contraceptive implant removed. Theyd 
gone from 237,000th in the queue to top of 
the list in one leap. 

To think that she and Mathias would now 
be actively trying to make a new person. A 
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The greatest gift. 


new life that would start as a few cells in her 
womb and grow into an individual, a human 
being who would change her life. 

She still felt shaky when she thought about 
what her grandmother had done for her. And 
now this nausea without even an embryo to 
cause it. Morning sickness. Mourning sick- 
ness. Traumatic to be handed the pot of a 
loved one’s ashes and told that they had died 
so you could live a 
fuller life. A small 
comfort that the 
pot was her grand- 
mother’s favourite 
shade of blue. Had 
Mormor realized that 
Karin’s life from now 
on would be joy and 
regret entwined like a 
helix of DNA? 

A child! What 
would she do with 
a child? She only 
came across them 
every now and then, 
quiet little things sur- 
rounded by adults. 
Centenarians had 
taken over the playgrounds, whooping and 
swinging and sliding with their titanium 
hips and their fresh-grown organs. 

“Ow!” Karin winced as the doctor injected 
local anaesthetic in her upper arm. 

“Now hold still while I dig it out,” the 
doctor said. 


“So sweet. He looks like your uncle Nikolaj.” 
Josefine, Karin’s mother, curled her index 
finger into the newborn’s hand. “Can I take 
care of him sometimes?” 

“Of course, Mor. You're the third person 
who’ asked. Actually, can you look after him 
on the night of our wedding anniversary? 
May 18th?” 

“Tm sorry, Eric is taking me skiing that 
weekend.” 

Oh, that was just like her mother. But 
someone else would willingly babysit. “Help 
me feed him?” 

“Here. Pll hold him while you get ready. 
What are you going to call him?” 

“Maybe Nikolaj.” Karin accepted the 
bundle from her mother. She brought his 

head close to her 
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her body as the baby sucked and brought 
down her milk. “I wish Mormor could have 
met him.” 


“She looks so happy.’ Anitta touched Karin’s 
cheek on the screen. It was wet, like her own. 

She turned to face her guard. “The govern- 
ment has been extremely generous.” 

“It’s time now, he said. 

“I know.’ Last year 
they had interviewed 
her four times, ask- 
ing her about the 
gold cross she wore 
around her neck and 
whether she was sure 
she wanted to take 
‘such a step. My rela- 
tionship with God is 
personal. God wants 
what is best for me 
and my family. 

Anitta followed the 
guard out of the room 
and along the hospi- 
tal corridor. They 
would give her the 
injection in the com- 
fortable room shed slept in since her ‘death’ 
over a year ago. Theyd been so accommo- 
dating. She was truly grateful. 

A nurse’s warm hand covered hers, sooth- 
ing the fear that fluttered like a moth in her 
chest. There'd been worse moments in her 
life, sickening fears, mostly involving her 
children and their accidents. In comparison, 
this was a question mark, equal parts hope 
and fear. 

Death began with a beautiful dream. 

A small girl playing in the snow. A young 
woman at university. A ringing of church 
bells. A mother with her first child. 

Karin and her baby, dark-eyed Nikolaj. 
Anitta held him in her arms. She felt him 
grow chubby and tall. The smell of his hair. 
The softness of his skin. 

The sound of the bells, snow on the grave- 
tops. 

Love blossoming into unbearable ecstasy. 
Fireworks, noise. 

Silence. Dark. 

Peace. 

Perfect peace. = 


Aislinn Batstone is no longer a practising 
Catholic, scientist or philosopher, but she is a 
practising mother. Always practising, never 
perfect, but the kids seem to be turning out OK. 


JACEY 


